Somewhat OT: Google Indexing OCR'd PDFs
Jim Battle
frustum at pacbell.net
Fri Oct 31 17:50:55 CDT 2008
Josef Chessor wrote:
> http://googleblog.blogspot.com/2008/10/picture-of-thousand-words.html
>
> Could this indeed be useful, especially when sites like Bitsavers are indexed?
yes, but does Al want each of the crawlers sucking down gigabytes of PDF
images?
On my own much smaller websites, I've segregated image-only PDFs into
their own directories and then put an exclusion of those directories in
robots.txt to keep out the crawlers.
More information about the cctalk
mailing list