Somewhat OT: Google Indexing OCR'd PDFs

Jim Battle frustum at pacbell.net
Fri Oct 31 17:50:55 CDT 2008


Josef Chessor wrote:
> http://googleblog.blogspot.com/2008/10/picture-of-thousand-words.html
> 
> Could this indeed be useful, especially when sites like Bitsavers are indexed?

yes, but does Al want each of the crawlers sucking down gigabytes of PDF 
images?

On my own much smaller websites, I've segregated image-only PDFs into 
their own directories and then put an exclusion of those directories in 
robots.txt to keep out the crawlers.



More information about the cctalk mailing list