Risks of DJVU/lossy compression - Re: If you OCR, always archive the bitmaps too
toby at telegraphics.com.au
Sun Sep 27 15:16:57 CDT 2015
On 2015-09-27 4:14 PM, Toby Thain wrote:
> On 2015-09-27 2:33 PM, Fred Cisin wrote:
>> On Sun, 27 Sep 2015, Pontus Pihlgren wrote:
>>> It seems to me that a better tool could solve the issue. One that
>>> could display the OCR:ed content only and the scanned content
>>> only when desired, for instance when you suspect an error.
>>> Is there such a reader? Is the content organised to make it
>> I haven't seen one.
>> I did start trying to write an heuristic probabilistic OCR one 25 years
>> ago. The idea being to overlay the OCR'd (displayed with matching
>> fonts) over the scanned content. ...
> DJVU compression is somewhat analogous to this process, ...
> There was a somewhat scary case study on the web a few years ago (not
> sure if it's still out there, haven't been able to find it)
Here it is.
The compression method was apparently JBIG2, but it could also affect DJVU.
> ... The risks are obvious(*).
> * - Hat tip to PGN. comp.risks digest.
More information about the cctalk