What is OCR and why do scanned PDFs need it?
OCR (Optical Character Recognition) is the technology that converts an image of text into machine-readable characters. When you scan a document, the scanner creates an image of each page — like a photograph. Without OCR, the PDF contains no text data: you cannot search it, select text, copy from it, or use it in any text-processing workflow. Running OCR adds an invisible text layer that mirrors the visual text, making the document fully searchable, selectable, and accessible.
How to OCR a PDF free online — step by step
- Open ihatepdf.cv/ocr-pdf — no sign-up required
- Drop your scanned or image-only PDF onto the upload area
- Select the primary language of the document for best accuracy
- Click Recognize Text — Tesseract.js processes each page locally in your browser
- Copy the extracted text directly, or download as a .txt file — no watermark
How to tell if your PDF needs OCR
Open the PDF and try to select a word by clicking and dragging. If you can highlight individual words, the PDF already has a text layer — use Extract Text instead. If your cursor shows a crosshair and you can only draw a box over the whole page, it's an image-only PDF that needs OCR first.
What to do after OCR
- Search the document — press Ctrl+F / Cmd+F to search for any word
- Copy specific sections — select and copy text exactly as in any digital document
- Translate — paste extracted text into DeepL or Google Translate
- AI analysis — feed the text to Chat with PDF or AI Summarizer
- Edit — take the extracted text into a word processor and reformat
Tips for best OCR accuracy
- Scan at 300 DPI minimum — higher resolution significantly improves accuracy
- Black and white scan mode — higher contrast produces cleaner character recognition
- Straight scanning — skewed pages reduce accuracy; scan pages flat
- Select correct language — the right language model makes a major difference for non-English text
Frequently asked questions
Is my file uploaded to a server?
No. OCR runs locally in your browser using Tesseract.js via WebAssembly. Your file never leaves your device.
How accurate is the text recognition?
Clean 300 DPI scans of typed text: 95–99%. Standard office scans (150–200 DPI): 85–95%. Handwritten text: 40–70% depending on clarity.
Does the output have a watermark?
No. ihatepdf never adds watermarks to any output.