A short history: postal codes, Kurzweil, and the first reading machines
Optical Character Recognition started as an industrial problem. In the 1950s and 60s, postal services were drowning in handwritten and typed addresses, and the first OCR machines were custom-built to read printed ZIP codes off envelopes at a few letters per second. They were the size of fridges and read exactly one font.
The first general-purpose OCR — software that could be pointed at arbitrary printed text — arrived in the 1970s with Ray Kurzweil's Reading Machine, originally built so that blind users could have books read aloud to them.
Tesseract, the open-source engine that powers most browser-side OCR including SnapToolz, started life inside Hewlett-Packard in the 1980s, was open-sourced in 2005, and was taken under Google's wing in 2006. The current version is built around a neural network rather than the hand-coded heuristics it shipped with originally.
How modern OCR actually works
A modern OCR pipeline has roughly four stages. First, layout analysis: detect where the page has text, separate columns, figure out reading order, identify tables and figures so they can be handled differently.
Second, line segmentation: take each text region and slice it into individual lines. Then, depending on the engine, lines are either further sliced into individual characters (the old approach) or kept whole and fed to a recurrent network that reads them like a sequence (the modern approach).
Third, recognition. Modern engines feed each line image into a convolutional neural network to extract visual features, then through an LSTM or transformer that reads the feature sequence and emits a character sequence. Tesseract 4+ does exactly this. Commercial engines like Google Vision and AWS Textract use larger transformer models trained on far more data.
Fourth, post-processing: a language model reranks the raw recognition output. If the network was unsure whether a character was 'rn' or 'm', a language model with a dictionary will almost always pick 'modern' over 'rnodern'. This is why OCR on real words is so much better than OCR on random strings or product codes.
Honest accuracy numbers
Here is what you can actually expect. On a clean, high-resolution scan of a printed book or PDF in a common font, modern open-source OCR hits 98–99.5 percent character accuracy. Commercial cloud OCR hits 99.5–99.9 percent. The difference is small and rarely matters for normal use.
On a phone photo of a printed receipt or business card, with good lighting and the page held flat, accuracy drops to roughly 90–96 percent. Commercial OCR holds up better here — closer to 97–99 percent.
On a phone photo of a wrinkled receipt taken in a dim restaurant at an angle, expect 70–85 percent. The thermal paper has low contrast, the angle distorts the geometry, and the lighting is uneven.
Handwriting is a different planet. Clean, printed handwriting in pencil on lined paper: 50–70 percent on open-source engines, 80–90 percent on Google's specialised handwriting model. Cursive or rushed handwriting: open-source is unusable; even the best commercial models hover around 70 percent.
What wrecks accuracy
Low contrast is the biggest killer. Faded thermal-paper receipts, photocopies of photocopies, pencil on coloured paper — anything where the text doesn't pop sharply from the background — confuses the segmentation step.
Rotation and perspective distortion are the second-biggest. A page photographed straight-on at eye level reads cleanly; the same page photographed from a 30-degree angle looks slightly trapezoidal, and the recognition network has not seen many trapezoidal letters in training.
Low resolution is straightforward arithmetic: if the character height is below about 20 pixels, the recognition model is guessing. Get closer or crop.
Multi-column layouts and complex tables are the hardest layout challenge. Most OCR engines will return correct text but in the wrong reading order — they'll concatenate column 1 line 1 with column 2 line 1, garbling the meaning.
Practical steps to push your accuracy up
Lighting is the single highest-leverage thing you control. Diffuse, even light from above or to the side, with no shadow falling across the page. Avoid flash — direct flash creates a hot spot in the middle of the page that blows out the text.
Hold the camera straight and flat above the page. Most phones now have a document-scanner mode that detects edges and de-warps the image; if yours does, use it. The de-warp step alone can take you from 80 to 95 percent accuracy on a receipt.
Crop to the text region before you send it to OCR. If the source is faint, preprocess: increase contrast, convert to grayscale, threshold to pure black-and-white. SnapToolz's OCR tool exposes a preprocessing step for exactly this reason.
Tools used in this guide
FAQ
- Why does OCR misread '1' as 'l' or 'rn' as 'm'?
- Because at typical resolutions and fonts, those character pairs really do look almost identical to a recognition model. Modern engines use a language model to disambiguate based on the surrounding word ('modern' is in the dictionary, 'rnodern' isn't), but it breaks down on product codes and serial numbers.
- Is browser OCR really as accurate as cloud OCR?
- For clean printed text, yes — both sit at 98–99 percent. For phone photos of receipts, cloud OCR is a few percentage points better. For handwriting and complex tables, cloud is dramatically better; specialised models exist on the cloud side that don't have open-source equivalents.
- Can OCR read handwriting?
- Tesseract and most open-source engines: barely, and only clean printed handwriting. Google's handwriting-specialised models do meaningfully better — 80–90 percent on clean printing, around 70 percent on neat cursive. Doctor handwriting and rushed cursive remain unreliable across every engine.
- Why does OCR mangle tables?
- Because most OCR engines extract a sequence of text lines and have no model of two-dimensional spatial layout. Cloud services like AWS Textract and Google Document AI have dedicated table-structure models that preserve the grid; open-source engines mostly return a flat stream of cells in an unhelpful order.