# PDF & Image Text Extractor / pdf-image-text-extractor

---

## Introduction

Recognize and extract text from images or PDF documents. Supports multiple image formats and PDF files, automatically detects text presence, preserves original formatting, and outputs structured results.

**Core Value**

- **Dual Format Coverage**: Supports both images (PNG, JPG, GIF, WebP, etc.) and PDF documents — one tool for both scenarios.
- **Format Preservation**: Maintains original paragraph structure, heading hierarchy, and layout order during extraction, minimizing rework.
- **Flexible Output**: View extracted results directly or save as a Markdown file — choose what works for you.

**Who It's For**

- 📄 **Office Workers** — Quickly extract editable text from scans and screenshots, eliminating manual transcription.
- 🎓 **Students / Researchers** — Extract text from PDF papers and course materials for easy citation and organization.
- 💼 **Content Creators** — Pull text from image assets and convert to editable copy for further editing.

---

## Features

### Core Features

- **Image Text Recognition**: Upload an image and automatically detect and extract all text content — titles, body text, annotations, watermarks — while preserving the original layout.
- **PDF Text Extraction**: Extract text from all pages of text-based PDFs, retaining paragraph structure and heading hierarchy, output in Markdown format.
- **Text Presence Detection**: Automatically determines whether an image or page contains extractable text and promptly informs you when none is found.
- **Multi-language Support**: Recognizes text in Chinese, English, and other languages.
- **Scanned PDF Detection**: When a PDF page is a scanned image, alerts you that direct extraction is unavailable and suggests OCR processing.
- **Result Saving**: Extracted results can be saved as a `.md` file on demand, including source, extraction status, and text content.

---

## Usage Guide

Simply describe your need in natural language and upload an image or PDF — no commands to memorize.

### Quick Reference

| Intent                    | Example Phrase                                      | Result                                                                |
| ------------------------- | --------------------------------------------------- | --------------------------------------------------------------------- |
| Extract image text        | "Extract the text from this image"                  | Recognizes all text in the image, preserving original layout          |
| Extract PDF text          | "Extract the text from this PDF"                    | Extracts text page by page, retaining paragraphs and heading levels  |
| Extract and save          | "Extract the text from this PDF and save it"        | Extracts text and generates a `.md` file                             |
| Handle scanned PDF        | "Read the text from this scanned document"          | Detects scanned pages and alerts you; extracts from text-based pages |

---

## Use Cases

| Scenario                      | Role                   | Example Phrase                                      | Benefit                                                              |
| ----------------------------- | ---------------------- | --------------------------------------------------- | -------------------------------------------------------------------- |
| Image text to editable copy   | Office Worker          | "There's text in this screenshot, extract it"       | Skip manual typing, get editable text quickly                        |
| PDF paper excerpt             | Student / Researcher   | "Extract the text from this PDF paper"              | Preserve original structure for easy citation and organization       |
| Scanned document content      | Admin / Finance        | "Can you read the text from this scan?"             | Auto-detect scanned pages; extract from text-based pages normally   |
| Asset text re-purposing       | Content Creator        | "Convert the copy in this image to text"            | Quickly get text assets for re-editing and publishing                |
