Obituary: Farewell to robots.txt (1994-2025)

It is with deep sorrow that we announce the end of robots.txt, the humble text file that served as the silent guardian of digital civility for thirty years. Born on February 1, 1994, out of necessity when Martijn Koster’s server crashed under a faulty crawler named “Websnarf,” robots.txt passed away in July 2025, not by Cloudflare’s hand, but from the consequences of systematic disregard by AI corporations. Cloudflare’s decision to block AI crawlers by default merely marked the moment when even the last major infrastructure provider abandoned faith in voluntary compliance and moved to technical enforcement. This was a last act of desperation that signaled the end of an era. As with all significant losses, it took time for the full extent of this digital tragedy to be comprehended.

Henning Fries is a UI/UX engineer with a passion for sustainable web design, digital accessibility, and the psychology of good user experiences. For over fifteen years, he has been working as a designer, developer, and consultant at the interface between people, technology, and design—in Germany, France, and Luxembourg. As a full-stack developer with a focus on design and a green front-end enthusiast, he combines technical expertise with a clear awareness of resource conservation and user experience. His goal: digital products that are meaningful, accessible, and human.

A Life of Silent Service

robots.txt was born in a time when the internet resembled a small, quiet neighborhood – manageable, personal, and characterized by mutual trust. One knew the bots that came by and maintained digital etiquette with each other. robots.txt, originally named “RobotsNotWanted.txt,” was never designed to fight complex legal battles or confront billion-dollar companies – it was simply a polite, yet firm, hint: “Please do not go this way.”

In its golden years, robots.txt lived in perfect harmony with the major search engines. Google respected it, Yahoo honored it, and even AltaVista – rest in peace – and Lycos followed its instructions. It was a give and take. It was a friendship on equal footing, marked by a simple truth: search engines received content for indexing, while websites, in return, got traffic. This crawl-to-referral ratio – the ratio of bot accesses to returned users – was a fair 14:1 for Google. For every 14 pages accessed by bots, a user on average found their way back to the website. Today, this contract is broken: AI crawlers generate thousands or millions of accesses, while hardly any traffic returns through links or mentions.

"Anthropics ClaudeBot showed by far the highest crawl-to-referral ratio in June 2025 – about 70,900 crawls per one referral, far more than any other AI crawler."

robots.txt was so fundamental to the functioning of the internet that in 2022, it was finally formally recognized with RFC 9309. Yet, even this late knighthood could not stop its fate.

Chronicle of a Slow Death

The first signs of change appeared in 2017 when the Internet Archive announced it would no longer consider robots.txt when archiving historical content. On April 17, 2017, Mark Graham (Director of the Wayback Machine) stated that robots.txt files – especially those intended for search engines – do not always align with archival goals. The Internet Archive aims to preserve the most complete snapshots of the web possible, including duplicate or large content.

"Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes."

But this was merely a prelude to the progressive, systematic exploitation that was to follow. With the advent of artificial intelligence, the internet transformed from a collaborative space into an extraction zone.

However, instead of the hoped-for collaborative partnership, systematic exploitation ensued. AI corporations erected new digital barriers: Cloudflare’s default blocking, paywalls for API access, and exclusive licensing deals with select publishers. Content creators found themselves facing an industrial extraction machine that profited from their work without compensation. The internet, once conceived as an open network for all, morphed into a centralized data mine for tech giants.

OpenAI led the charge with its GPTBot, ChatGPT-User, and OAI-SearchBot – a trinity of violations that left robots.txt helplessly watching its directives being diligently ignored. The company publicly claimed compliance, while in June 2025, Cloudflare documented a devastating crawl-to-referral ratio of 1,700:1 – industrial extraction without meaningful return.

Anthropic added further to the suffering. ClaudeBot, anthropic-ai, and Claude-Web hammered servers, with iFixit experiencing one million visits in 24 hours and Freelancer.com experiencing nearly four million in four hours. With a crawl-to-referral ratio of 73,000:1, Anthropic crossed all boundaries of decency – it was like entrusting a neighbor with your house keys to water the plants, only to find they had carted off all your belongings.

Perplexity AI was among the most aggressive actors: it used undisclosed IP addresses and third-party services to obscure crawling activities. When CEO Aravind Srinivas publicly stated, robots.txt was not a legal framework, it was an open affront to the decades-old fragile protocol.

A Text File in the Shadow of the Last Stand

In its final months, robots.txt fought desperately for the relevance of bygone days. Website operators developed increasingly sophisticated support systems: crawler fingerprinting with TLS analysis (Transport Layer Security), honeypot traps, and behavioral analysis. But it was like trying to treat acute blood poisoning with fever reducers – technically sound, but not equal to the scale of the threat.

The European Data Protection Board attempted to give the protocol legal force with its Opinion 28/2024, while Italy’s data protection authority, Garante, fined OpenAI with a fine of 15 million Euros. But these were desperate resuscitation attempts of a system that had long collapsed – voluntary respect could no longer be saved.

Alternative protocols – ai.txt, TDM ReP, “No-AI-Training” HTTP headers – were discussed as potential successors. But they all carried the stigma of their birth: they arose not from cooperation, but from confrontation.

The Call for Proposals for enterJS 2026, taking place on June 16 and 17 in Mannheim, has been launched. Until November 12, the organizers are looking for presentations and workshops on JavaScript and TypeScript, frameworks, tools and libraries, security, UX, and more.

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.