justgold79 at gmail.com
Sun May 23 00:08:04 CDT 2021
I've always thought that robots.txt would the be interesting stuff that
should be archived, perhaps it could be behind a paywall. There's no law
against archiving it other then subnets being blocked, which is easily
bypassed as matt cutts wrote a blog post on silently spidering content.
Also you can use the cloud proxies which call themselves sdwan. Yacy and
other p2p web crawlers are another way to go.
On Sat, May 22, 2021, 11:28 PM Chuck Guzis via cctalk <cctalk at classiccmp.org>
> On 5/22/21 7:41 PM, Adrian Stoness via cctalk wrote:
> > link rot is weird in what disapears vs still works
> > On Sat, May 22, 2021 at 6:45 PM Ali via cctalk <cctalk at classiccmp.org>
> > wrote:
> >> Interesting article on Link Rot and its prevalence. According to the
> >> article even sources being referenced as early as 2018 have about a 60%
> >> Rot.I think all of us in this hobby can relate nor only to loss of
> >> but from sites, drivers, file repositories, etc....
> I've said it before--putting information on the web is like writing in
> sand. Thank heavens for the Wayback machine (which is why I support
> Brewster's efforts).
> However, it's far from perfect--in particular ftp content has apparently
> never been archived and many vendor's support pages have had robots.txt
> files preventing them from being archived.
> Still, it's better than nothing and I appreciate it. Were it more
> complete, I might not have to spend so much time reverse-engineering
> Try searching for some of the older, say, HP support pages. I'm pretty
> sure that some "executive' made the decision to pull all of the support
> material for old systems, as that doesn't contribute to the bottom line.
> The New HP Way.
> A nasty trend is adware sites are simply quoting text from a large
> number of now defunct pages; go to the link and you get the
> "CONGRATULATIONS! YOU ARE THE ONE BILLIONTH VISITOR!" page. Run, do not
> walk away.
> A more disturbing popular trend is information being placed in long-ish
> Youtube videos that could have been summarized concisely in a page of text.
More information about the cctech