Tesseract is an open-source OCR engine — powers many browser WASM and server OCR pipelines.
Tesseract recognises text in scanned images and PDF pages — language packs improve accuracy for DE, EN and 100+ scripts. Preprocessing (deskew, contrast) boosts results.
Tentaco PDF OCR uses WASM OCR where supported — processing stays in-tab for sensitive scans. For born-digital PDFs with text layers, use PDF to Text instead.