Jules Richardson julesrichardsonuk at
Thu Nov 30 08:27:15 CST 2006

Richard wrote:
> In article <456D4EB3.7000106 at>,
>     Jules Richardson <julesrichardsonuk at>  writes:
>> It's OK if you have top-quality documentation. But lots of computer docs out 
>> there are old, faded, dirty, creased, well-thumbed etc. and unless someone's 
>> prepared to visually check every scanned page, there's a chance that the 
>> bi-level algorithm in use will corrupt the data and it'll go unnoticed.
> I check every scanned page as I scan it.  You have to anyway, because
> if it doesn't scan right you have to rescan it to get it right.

The main problem I find with that is that it's time-consuming to check every 
page at the scanned resolution (i.e. 1:1 mapping between on-screen pixels and 
scanned dots). However at a typical "fit page to window" zoom level it's easy 
to make sure that the page was scanned straight etc., but easy to miss things 
which might hinder some future OCR process.

No process is going to be perfect, of course, but we are maybe at a point in 
terms of storage availability and transmission speed that we can handle a 
quality improvement for the really hard to find stuff.



More information about the cctalk mailing list