Scanning Results

Guy Dunphy guykd at optusnet.com.au
Sun Jul 21 10:04:59 CDT 2019


At 09:05 PM 20/07/2019 -0700, Al wrote:
>
>> I wish I knew why ISO and Adobe never updated PDF to include PNG images.
>
>The pdf format supports png just fine. 

Oh does it! The texts say it doesn't, and it definitely didn't originally.
Maybe the change is in one of the more recent ISO standards since ISO 32000-1:2008 ? 
(That cost many hundreds of dollars so I don't have them.)

>A modified version of Eric Smith's tumble accepts png as input.

I hadn't heard of that one. Something to have a look at.
My, google tries *really* hard to make me look at 'Eric Smith tumblr'

But actually:
  http://tumble.brouhaha.com/   No mention there of PNG files, as of 2017.

  https://github.com/brouhaha/tumble  Last update Dec 2017. Says:

README:
tumble: build a PDF file from image files
Copyright 2003-2017 Eric Smith <spacewar at gmail.com>

Tumble is a utility to construct PDF files from one or more image
files.  Supported input image file formats are JPEG, and black and
white TIFF (single- or multi-page).  Black and white images will be
encoded in the PDF output using lossless Group 4 fax compression
(ITU-T recommendation T.6).  This provides a very good compression
ratio for text and line art.  JPEG images will be preserved with the
original coding.

The current version of Tumble will only work on little-endian systems,
such as x86, VAX, and Alpha.  The byte order dependencies will be fixed
in a later release.
  ----


Still no mention of PNG. Modied by who/where? Do you have a link?

The ISO 32000-1:2008 in Table 6 defines the PDF compression methods, that
can be applied to 'streams' (ie binary data blocks, including images.)

 See PDF_32000_2008_table_6_700_gray_16.png
 This and other files mentioned below, at http://everist.org/png-pdf

Point being, that PDF internally does not allow foreign image encodings,
otherwise how would PDF viewers deal with them?
You can pass a PDF constructor an image in any format the constructor can understand,
then it will re-encode using one of the defined PDF stream compression methods.

Or you can use a pure binary stream 'attachment', but then the reader
doesn't know it's an image.


>The Tektronix color catalog scans on bitsavers were scanned as pngs

Looking at this one (because it's one I happened to pick a paper copy off my shelf):
  http://bitsavers.org/pdf/tektronix/catalog/Tektronix_Catalog_1975.pdf
Taking page physical 52, PDF #58  (Because I like Tek 7903 scopes)

The photoshop CS6 extraction of that page from the PDF is 2550 x 3296 px.
  I saved it as PNG 24 bit, file size is 6,744 KB. File Tektronix_Catalog_1975-58.png
  This is not the compressed image size inside the pdf. Without a PDF analysis tool
  that would be really tedious to determine. Ditto the compression format. 
  Of course Photoshop is not going to tell you.


Enlarging in photoshop, the image has definitely been JPG encoded in the PDF, as it
has the typical JPG edge noise on characters. 
  File image_jpg_artifacts.png
No way your scanner produced that.
Important point there. You may have passed your PDF creator a PNG image, but in the
PDF it was re-encoded as JPG.


Other issues:
 * Bleed through of print on other side of paper. Cure: Use a black backing sheet.
 * A lot of shading in the 'white' paper. Inflates file size. Cure: set scanner curves correctly.
 * Plenty of specking. Cure: scanner curve, plus manual touchup in photoshop.
 * The crop frame is off the page edges.

I scanned the same page from my paper copy, at 300 DPI, black backing, to PNG-24. 
Result: file 7903_02.png
  3221 x 4349 px. File 5,249 KB  (BTW 300 DPI is a bit too little for the screened images.)
  Notice higher res compared to the PDF image, but already smaller file. Just by having cleaner 'white'. 

Did manual touchup in photoshop. Mostly to get rid of some remaining shading & specks in white.
Summary: select color ranges of black and blue text, add a block for the image, expand 2 px, invert sel,
fill with pure white. Paint a few remaining specks. Select the screened image block, blur 1.8 px radius
to kill screening dots.
Yes, I'm aware this is tedious, and no I don't know of a way to automate it. Because it needs to be fine
tuned every time. So I'm also aware this is not practical for bulk scanning. Just demonstrating contrasts.

Then scale to 2550 H. (vert now 3443, different due to PDF vers wider crop.) 
Save as PNG 24 bit. File: 3,675 KB.   file: 7903_06_2550_24.png
Vastly better quality than the PDF version, already about half the file size.
If the page was only black text, we could now save in PNG 4 bits/px grayscale.
But it has color and and shaded image. So choose 8 bit/px indexed.
  File: 7903_06_2550_8.png
Absolutely no visible difference, but now the saved file is reduced to 1,058 KB.

Starting again with the clean full size scan, reduce to 1200 x 1620, (a good screen size)
and 8 bit/px indexed. (Adequate for this page.)   Saved file size: 339 KB.
  File: 7903_07_1200_8.png


Btw, I don't suppose anyone has a copy of a utility called PDF Dissector, from Zynamics?
Google bought out Zynamics and withdrew the utility from the market, in 2011.

Guy








More information about the cctech mailing list