OCR software
Dave McGuire
mcguire at neurotica.com
Sat Sep 2 14:49:32 CDT 2006
On Sep 1, 2006, at 4:05 PM, Jules Richardson wrote:
>>> HP developed an OCR engine called Tesseract that is supposed to be
>>> pretty good. They released it to the open-source world, and Google
>>> has
>>> picked it up and started working on it.
>> classiccmp list member James Markevitch has been working on an OCR
>> program
>> as well, optimized for column formated input, like listings.
>
> Cross-platform, or one specific OS?
At first glance, it appears to be Linux-specific, but that's
generally pretty easy to un-do. The important part is it's not Windoze
software.
> I started putting some stuff together to allow a user to graphically
> describe a scanned page (so you'd roughly mark out what were images,
> what were columns of text etc.) prior to feeding to an OCR engine, as
> experience of commercial products has been that they tend to get it
> wrong too much to be left to run without user input. Unfortunately the
> Linux OCR engines available proved to be just too poor in quality to
> make it worthwhile, so I shelved it until something better came along
> - maybe Tesseract will do the job.
It's possible...might be worth looking into.
-Dave
--
Dave McGuire
Cape Coral, FL
More information about the cctech
mailing list