Document OCR

Extract text from a photograph or scan of a historical document. OCR runs in your browser after loading the OCR engine and language data from external CDNs.

Document language

Drop an image here
or click to browse
JPG, PNG, TIFF, WebP

Loading…

Extracted text

Extracted text will appear here after scanning.

Limitations of OCR on historical documents OCR works best on clearly printed typeface. Results on old handwriting, Fraktur (Gothic) script, damaged documents, or poor-quality photographs will be imperfect. Use the extracted text as a starting point for transcription, not as a finished transcript. For handwritten records, Transkribus is usually a better fit because it is built for handwritten text recognition and document transcription. First scan of each language downloads a language file (~10–20 MB) which is then cached in your browser.

Privacy note The app code sends the selected image directly to the browser-based OCR engine, not to an ArchiveIndex server. This page does load Tesseract.js and language assets from third-party CDNs, so avoid using it for sensitive documents unless you trust those providers or self-host the assets.