Could we have a #DNS record for telling AI crawlers to stay away?
-
Could we have a #DNS record for telling AI crawlers to stay away?
@anderslund
No. We already have robots.txt for that, which they intentionally ignore, how would a DNS record be any different? -
@anderslund
No. We already have robots.txt for that, which they intentionally ignore, how would a DNS record be any different?@leeloo In case any ai crawler owner with deacency exists, a DNS record would save us from A LOT of grief

-
@cruiser That is why a DNS record would be nice. As is, you have to fight them at the door.
@anderslund @cruiser Most of the biggest crawlers, including LLMs, use a User Agent String, that can be filtered.
There is also the robots.txt path as you would use for normal search engine crawlers.
However. It turns out that both search engine crawlers, as well as LLM crawlers, changes their user agent strings from time to time making it difficult to block. And the robots.txt is more of a "recommendation" rather than actually something that they should comply to. Google indexes all pages, disregarding the robots.txt completely. It only flags pages, internally, with a "hidden"-flag.
Cloudflare has a "block ai bots" feature, https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/, but its very much hit and miss. Since crawlers changes their user agent strings and IP adresses on a regular basis, these tools are not as reliable as one might think.
-
@anderslund @cruiser Most of the biggest crawlers, including LLMs, use a User Agent String, that can be filtered.
There is also the robots.txt path as you would use for normal search engine crawlers.
However. It turns out that both search engine crawlers, as well as LLM crawlers, changes their user agent strings from time to time making it difficult to block. And the robots.txt is more of a "recommendation" rather than actually something that they should comply to. Google indexes all pages, disregarding the robots.txt completely. It only flags pages, internally, with a "hidden"-flag.
Cloudflare has a "block ai bots" feature, https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/, but its very much hit and miss. Since crawlers changes their user agent strings and IP adresses on a regular basis, these tools are not as reliable as one might think.
@nickfrederiksen @cruiser Cloudflare and some other tools delays the request in the hope this will make them go away, iiutc. That is slowing the internet down for ALL of us.
If there was a formal mechanism to tell them to stay away, it would be easier to place shame on those criminals.
All those unwellcome requests, and the defence mechanisms, costs time, electricity etc, and that is a thread to the environment and costs money - not for the companies, rather for those that have brain enough to not want them in.
-
@nickfrederiksen @cruiser Cloudflare and some other tools delays the request in the hope this will make them go away, iiutc. That is slowing the internet down for ALL of us.
If there was a formal mechanism to tell them to stay away, it would be easier to place shame on those criminals.
All those unwellcome requests, and the defence mechanisms, costs time, electricity etc, and that is a thread to the environment and costs money - not for the companies, rather for those that have brain enough to not want them in.
@anderslund @cruiser Agree!
But think of the shareholders. Those poor, poor, shareholders.
-
@leeloo In case any ai crawler owner with deacency exists, a DNS record would save us from A LOT of grief

@anderslund
How would it improve anything over using robots.txt?Also, someone writing a crawler knows how to use whatever library the are using to speak http. They may not know how to query a dns record. As far as I can see, robots.txt is easier for everybody.
-
@anderslund @cruiser Agree!
But think of the shareholders. Those poor, poor, shareholders.
@nickfrederiksen @cruiser Why is it that it is a common belief that owing shares means you do not need deacency? I do NOT understand that.
-
@anderslund
How would it improve anything over using robots.txt?Also, someone writing a crawler knows how to use whatever library the are using to speak http. They may not know how to query a dns record. As far as I can see, robots.txt is easier for everybody.
@leeloo Relying on robots.txt means that the undesired traffic will happen, and place the costs on the website owners.
-
@nickfrederiksen @cruiser Why is it that it is a common belief that owing shares means you do not need deacency? I do NOT understand that.
@anderslund @cruiser Great question. I guess it's because most investors invest in shares for personal gain, rather than growth in the company.
I hardly see anyone investing in a company because they believe in the philosophy, the people, the product. People invest in shares that will give them a big payout in the end.
-
@anderslund @cruiser Great question. I guess it's because most investors invest in shares for personal gain, rather than growth in the company.
I hardly see anyone investing in a company because they believe in the philosophy, the people, the product. People invest in shares that will give them a big payout in the end.
@nickfrederiksen @cruiser Even it you gain, spending money made with no decency have to kinda unnice?