arcarlini at arcarlini at
Wed Nov 29 15:41:00 CST 2006

Jules Richardson wrote:

> It's OK if you have top-quality documentation. But lots of computer
> docs out there are old, faded, dirty, creased, well-thumbed etc. and
> unless someone's prepared to visually check every scanned page,
> there's a chance that the bi-level algorithm in use will corrupt the
> data and it'll go unnoticed. 

Well I do look at each page in the final scan - pretty briefly just
to make sure that all the pages are in the correct order. Even so
I've never noticed a problem. I've never tried scanning a line-printer
listing and most stuff I scan is a reasonable manual, but even on
the few scans I've made of photocopies, I've never seen a problem.
It would have to be a gross problem to affect legibility of text
by a human (almost anything affects OCR :-( ).

> I suppose it's one of those situations where you end up throwing away
> information no matter what (after all, any scan is essentially a
> digital representation of analogue data), but there's a danger of
> throwing away too much data - and for rare docs you might only get
> the chance to scan them once. For rare items I'd rather have maximum
> quality "just in case", even if it does mean more storage space.

I scan at 600dpi bi-level G4-encoded and I end up with some manuals
in the hundreds of MB. I think at least one of the Digital Semi
manuals hits nearly 400MB. If I went to greyscale (8 bits) then
some of those manuals would take half a DVD[1] to store. Perhaps
a few years from now I'll come back and look for some way of
erasing this post ("2GB, that's nothing, I can downlaod that
in 10 mins!!") but right now 300MB is manageable (even as a download)
whereas 2GB is a stretch.

For my first few manuals I did redo pages in greyscale if they
had a photo but I stopped because I could not see a difference.

But by all means scan in 8-bits, just don't scan text as a JPEG 
unless the alternative is not scanning at all :-)


[1] Maybe more - I think G4 only works on bi-level, so by using
greyscale I think you lose some lossless compression too.

More information about the cctech mailing list