PDF to Text Conversion (Was: Manual scanning: TIFF-to-PDF software with greyscale support?)
Brad Parker
brad at heeltoe.com
Tue Dec 22 09:22:36 CST 2009
On Dec 21, 2009, at 9:38 PM, Al Kossow wrote:
> On 12/21/09 5:35 PM, Jerome H. Fine wrote:
>
>> I have about 100,000 lines of code in over 3 dozen PDF files that
>> were
>> scanned from the hard copy listings. Unfortunately, the original text
>> source
>> files were lost, so the PDF files are a last resort. Other than
>> typing
>> in the
>> code by hand from the PDF file, are there any good freeware programs
>> to convert a PDF back to a text file?
>>
>
> sounds like the TSX-Plus listings I scanned for Lyle.
I spent a little time playing with ocropus and then teseract, trying
to scan
pdp-11 diags back to text. I didn't have good luck. I'd be
interested if others
have a working formula.
I did have a little fun "training" tereract on the line printer font.
I think that
technique holds promise but it needed more data to do a good job (my
initial sample
was too small, but did improve things a lot).
just curious if anyone else has tried training one of the ocr programs
to read
line printer fonts.
-brad
More information about the cctech
mailing list