If you OCR, always archive the bitmaps too - Re: Regarding Manuals

Johnny Billquist bqt at update.uu.se
Sat Sep 26 16:51:02 CDT 2015

On 2015-09-26 23:42, Toby Thain wrote:
> On 2015-09-26 4:28 PM, Johnny Billquist wrote:
>> On 2015-09-26 12:16, Johnny Billquist wrote:
>>> On 2015-09-25 22:35, Al Kossow wrote:
>>>> I have been going back and applying OCR to the ones on bitsavers.
>>>> Are there some in particular that you have a problem with?
>>> Aha. I wasn't aware of that. I've downloaded copies many years ago that
>>> I've been keeping locally. I'll check out the current versions on
>>> bitsavers then.
>> Al, exactly how have they been OCRed? Looking at them, it would appear
>> that what you see is still the bitmaps of all the pages, but then you
>> have the basic text also available for selection/searching.
>> My issue with that is that the documents are huge, and the experience
>> just scrolling through them is pretty bad.
> Imho, though I am sure I am not alone:
> Software which "recreates" the typography of a document from OCR does
> not produce an acceptable substitute, I've yet to see a book that wasn't
> ruined by it.
> Just worth mentioning for anyone who might be tempted - For this reason
> and others, the bitmaps must NEVER be discarded (Although of course
> bitmaps can be archived in a different file if people want to supply OCR
> as well.)

Look at the results in the link I posted. I was more than happy with 
that result.

But sure, for those who like bitmaps, I'm certainly not going to take 
them away. But for me, I'm actually interested in the content, and not 
the pixels.


Johnny Billquist                  || "I'm on a bus
                                   ||  on a psychedelic trip
email: bqt at softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol

More information about the cctalk mailing list