[cctalk] Re: OCR and line printers

3 Dec 2025

...
  On Dec 3, 2025, at 10:55 AM, Adrian Godwin via cctalk
&lt;cctalk(a)classiccmp.org&gt; wrote:
 I don't think it's the general quality of the patent print that's poor,
 it's the line-printer listing section from
 https://www.hp9845.net/9845/downloads/patents/US4089059.pdf starting at
 about page 213 of the pdf , possibly section 26 of the patent.
 The print in that section is much paler than the rest - typical of a worn
 line-printer ribbon. I doubt the printed copy is any better.  I'm only
 trying to OCR the listing, not the rest of the patent. 
That's quite a cleaen listing, actually, cleaner than most I have worked with and
dramatically better than some.  The sort of slightly-damaged characters that appear should
be no problem at all for the "training" feature of ABBYY Fine Reader to deal
with.  What you'd have to do is run a number of pages through it in training mode, so
it sees a number of variations of the individual characters.  And as I mentioned,
you'd do all the scanning in the mode where it only accepts what it was trained with,
no "builtin" patterns.  That way it won't make up stuff that isn't part
of the character set but happens to match something built-in, like a pound-sterling sign.
It may be that scanning the listing as a table (with the various columns as table columns)
will work well, and give you the layout explicitly.  Or it can be scanned as plain text,
but in that case the spacing will mostly turn into individual spaces and you'd need
post-processing to insert tabs etc. to make it look right again.  Given the simple
assembler syntax involved that sort of post-processing would not be hard.
        paul

2026

2025

2024

2023

2022

[cctalk] Re: OCR and line printers