OCR software

Dave McGuire mcguire at neurotica.com
Sat Sep 2 14:49:32 CDT 2006


On Sep 1, 2006, at 4:05 PM, Jules Richardson wrote:
>>>    HP developed an OCR engine called Tesseract that is supposed to be
>>> pretty good.  They released it to the open-source world, and Google 
>>> has
>>> picked it up and started working on it.
>> classiccmp list member James Markevitch has been working on an OCR 
>> program
>> as well, optimized for column formated input, like listings.
>
> Cross-platform, or one specific OS?

   At first glance, it appears to be Linux-specific, but that's 
generally pretty easy to un-do.  The important part is it's not Windoze 
software.

> I started putting some stuff together to allow a user to graphically 
> describe a scanned page (so you'd roughly mark out what were images, 
> what were columns of text etc.) prior to feeding to an OCR engine, as 
> experience of commercial products has been that they tend to get it 
> wrong too much to be left to run without user input. Unfortunately the 
> Linux OCR engines available proved to be just too poor in quality to 
> make it worthwhile, so I shelved it until something better came along 
> - maybe Tesseract will do the job.

   It's possible...might be worth looking into.

          -Dave

--
Dave McGuire
Cape Coral, FL



More information about the cctech mailing list