Question about PDF manipulation

Eric Smith eric at brouhaha.com
Fri Jun 3 01:15:22 CDT 2005


Jan-Benedict Glaw wrote:
> You didn't answer my question:-)  Consider I prepare a TIFF file that
> contains (with additional tags) eg. some raw OCRed text, not
> read-checked. Now I preapre a PDF from this and use gs to get the image
> back.  Is my text still there?

Depends on whether the program you use to prepare the PDF file from the
TIFF file knows about those "additional tags" and does something with
them.

Anyhow, it's much easier to put OCR'd (unchecked) text behind the images
in PDF files.  You can actually put the characters or words at the
exact coordinates of the bitmaps, so that when you do a search and get
a hit, the bitmap of the matching word is highlighted.

Perhaps someone could write a PDF OCR utility based on kognition,
Clara OCR, GNU Ocrad, or ocre.

Does any TIFF file of the nature you describe actually exist?  PDFs
with both bitmaps and text are not uncommon.

Eric




More information about the cctalk mailing list