Question about PDF manipulation

Jules Richardson julesrichardsonuk at yahoo.co.uk
Thu Jun 2 13:28:18 CDT 2005


On Thu, 2005-06-02 at 13:35 -0400, Barry Watzman wrote:
> Some of the attitudes here about PDF impress me as about the most ludditical
> (is there such a word?) as I've seen in a long time.
> 
> Suppose you have a CompuPro memory card and no manual.  The card is useless.
> I give you the PDF file of the manual.  You can look at the manual
> on-screen, or you can print it and you HAVE the hard copy manual.  In color,
> where the original was in color, and with quality that may be
> indistinguishable from (or in some cases actually better than) the original
> manual when it was new.

Hang on...

For text, or a mixture of text and images, it's probably very good (and
I expect stands up reasonably well against other markup formats such as
RTF, Postscript, Word doc, HTML etc.).

For wrapping up a bunch of images, where's the benefit?

> So what if it's not "searchable".  Get a clue:  THE ORIGINAL PAPER MANUAL
> WAS NOT "SEARCHABLE". 

Of course. But it seems a natural thing to want once the original is in
some electronic form. Needless to say, I don't recall anyone ever saying
that they don't appreciate the fantastic efforts of those who do scan
and make documentation available (your message gave the impression of
the opposite, at least to me)

> (Note that if it was EITHER made from it's source document (e.g. Word or
> whatever) OR if it was "OCR'd" (which Acrobat itself can do), it WILL be
> searchable.

I don't think anyone's suggested otherwise have they? 

Currently documentation archives seem to be 95% image scans, or image
scans wrapped up in PDF files. There's very little plain text content
(and even fewer cases where any construct has been put together
containing text as text and diagrams etc. as images)

I don't doubt that the search tools are very good for OCRed text, but
then *maybe* they're just as good or better for other formats too?

>   Sorry if it's proprietary, but sometimes quality tools are only created by people who 
> want to be paid for their work.

If it's proprietary then the less use it is for a long-term archival
format though, surely? Same goes for complexity; increased complexity of
the format makes it harder to extract the data without the relevant
tool, which of course might be difficult to find in years to come...

cheers

Jules




More information about the cctalk mailing list