[cctalk] Re: Large language model (LLM) Web Scrapers

16 Sep 2025

I set my server to limit requests per hour from the same IP to slow them
down, and I have code to detect bots and redirect their sessions to a low
impact catch page.  It’s not that hard to control, but lately I have
noticed old tricks no longer work as well.  AI arms race.  But I always
believed publishing publicly would eventually cause the content to enter
the public domain.
On Tue, Sep 16, 2025 at 9:02 PM Cameron Kaiser via cctalk <
cctalk(a)classiccmp.org&gt; wrote:
...

  For those of you who run vintage
computing-related info sites, have you
 noticed all of the LLM scraper activity?    AI services are using the LLM
 scrapers to populate their knowledge bases.

 A massive, massive IP filter. There has been some collateral damage, but
 unfortunately I don't think this is avoidable. They're a plague.
 --
 ------------------------------------ personal:
 http://www.cameronkaiser.com/ --
   Cameron Kaiser * Floodgap Systems * www.floodgap.com *
 ckaiser(a)floodgap.com
 -- Time is an illusion. Lunch time, doubly so. -- Douglas Adams
 ---------------

2025

2024

2023

2022

[cctalk] Re: Large language model (LLM) Web Scrapers