It was thus said that the Great Paul Koning via cctalk once stated:
A web crawler that does not obey robots.txt is not a
law abiding outfit.
Best would be to block it entirely. If they are that dismissive of
honesty, they are also unlikely to pay attention to such matters as
copyright and intellectual property ownership.
That's what I did for one bot that identified itself as:
Mozilla/5.0 (compatible; Thinkbot/0.5.8;
+In_the_test_phase,_if_the_Thinkbot_brings_you_trouble,_please_block_its_IP_address._Thank_you.)
only it doesn't come from a single IP address, but thousands. I ended up
blocking over 450,000 addresses at the firewall level. Details here:
https://boston.conman.org/2025/08/21.1
-spc