Our next version of Backlight (v5.4.1) will include updates to our distributed robots.txt
file to block AI web crawlers from indexing your site, and I’ve already updated our robots.txt documentation to include new information about it, and links to external resources.
But as it’s a manual task to implement the robots.txt
on your site, there’s no need to wait for the update. You can implement the new rules immediately by adding all of this to a robots.txt
file at the root of your site.
This list is a live document, which I will be keeping up-to-date. The list focuses on blocking AI Data Scrapers, and blocks all that I am aware of.
User-agent: *
Disallow: /backlight
Disallow: /*/thumbnails/*.jpg$
Disallow: /*/single.php
User-agent: Applebot-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: CLAUDEBOT
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: omgili
Disallow: /
User-agent: Timpibot
Disallow: /
Additionally, users may or may not wish to block these AI Assistants. For now, these bots will not be included in future versions of Backlight’s distributed robots.txt file. On whether or not to block these bots, Dark Visitors says:
Probably not. AI assistants visit websites directly on behalf of human users, so blocking them will effectively block those users. This could lead to a poor user experience and possible negative sentiment about your website. Not blocking AI assistants will allow more human users to use your website as they choose.
User-agent: ChatGPT-User
Disallow: /
User-agent: Meta-ExternalFetcher
Disallow: /