OCR software

Jules Richardson julesrichardsonuk at yahoo.co.uk
Fri Sep 1 15:05:52 CDT 2006


Al Kossow wrote:
>>    HP developed an OCR engine called Tesseract that is supposed to be
>> pretty good.  They released it to the open-source world, and Google has
>> picked it up and started working on it.
> 
> classiccmp list member James Markevitch has been working on an OCR program
> as well, optimized for column formated input, like listings.

Cross-platform, or one specific OS?

I started putting some stuff together to allow a user to graphically describe 
a scanned page (so you'd roughly mark out what were images, what were columns 
of text etc.) prior to feeding to an OCR engine, as experience of commercial 
products has been that they tend to get it wrong too much to be left to run 
without user input. Unfortunately the Linux OCR engines available proved to be 
just too poor in quality to make it worthwhile, so I shelved it until 
something better came along - maybe Tesseract will do the job.


> I was just talking to Doron Swade (the person responsible for the Difference
> Engine at the British Science Museum) and he is interested in OCR of
> mathematical tables (also column-oriented like listings).

I've never actually met Doron, although his name tends to crop up an awful 
lot. I think he's possibly up at our museum next Friday, but I'll be on a 
plane at that point...

cheers

Jules


More information about the cctalk mailing list