Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. Could we have a #DNS record for telling AI crawlers to stay away?

Could we have a #DNS record for telling AI crawlers to stay away?

Planlagt Fastgjort Låst Flyttet Ikke-kategoriseret
dnsnoaicriminalsthieves
13 Indlæg 4 Posters 0 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • anderslund@expressional.socialA anderslund@expressional.social

    Could we have a #DNS record for telling AI crawlers to stay away?

    #ai #noai #criminals #thieves

    leeloo@chaosfem.twL This user is from outside of this forum
    leeloo@chaosfem.twL This user is from outside of this forum
    leeloo@chaosfem.tw
    wrote sidst redigeret af
    #4

    @anderslund
    No. We already have robots.txt for that, which they intentionally ignore, how would a DNS record be any different?

    anderslund@expressional.socialA 1 Reply Last reply
    0
    • leeloo@chaosfem.twL leeloo@chaosfem.tw

      @anderslund
      No. We already have robots.txt for that, which they intentionally ignore, how would a DNS record be any different?

      anderslund@expressional.socialA This user is from outside of this forum
      anderslund@expressional.socialA This user is from outside of this forum
      anderslund@expressional.social
      wrote sidst redigeret af
      #5

      @leeloo In case any ai crawler owner with deacency exists, a DNS record would save us from A LOT of grief 🙂

      leeloo@chaosfem.twL 1 Reply Last reply
      0
      • anderslund@expressional.socialA anderslund@expressional.social

        @cruiser That is why a DNS record would be nice. As is, you have to fight them at the door.

        nickfrederiksen@expressional.socialN This user is from outside of this forum
        nickfrederiksen@expressional.socialN This user is from outside of this forum
        nickfrederiksen@expressional.social
        wrote sidst redigeret af
        #6

        @anderslund @cruiser Most of the biggest crawlers, including LLMs, use a User Agent String, that can be filtered.

        There is also the robots.txt path as you would use for normal search engine crawlers.

        However. It turns out that both search engine crawlers, as well as LLM crawlers, changes their user agent strings from time to time making it difficult to block. And the robots.txt is more of a "recommendation" rather than actually something that they should comply to. Google indexes all pages, disregarding the robots.txt completely. It only flags pages, internally, with a "hidden"-flag.

        Cloudflare has a "block ai bots" feature, https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/, but its very much hit and miss. Since crawlers changes their user agent strings and IP adresses on a regular basis, these tools are not as reliable as one might think.

        anderslund@expressional.socialA 1 Reply Last reply
        0
        • nickfrederiksen@expressional.socialN nickfrederiksen@expressional.social

          @anderslund @cruiser Most of the biggest crawlers, including LLMs, use a User Agent String, that can be filtered.

          There is also the robots.txt path as you would use for normal search engine crawlers.

          However. It turns out that both search engine crawlers, as well as LLM crawlers, changes their user agent strings from time to time making it difficult to block. And the robots.txt is more of a "recommendation" rather than actually something that they should comply to. Google indexes all pages, disregarding the robots.txt completely. It only flags pages, internally, with a "hidden"-flag.

          Cloudflare has a "block ai bots" feature, https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/, but its very much hit and miss. Since crawlers changes their user agent strings and IP adresses on a regular basis, these tools are not as reliable as one might think.

          anderslund@expressional.socialA This user is from outside of this forum
          anderslund@expressional.socialA This user is from outside of this forum
          anderslund@expressional.social
          wrote sidst redigeret af
          #7

          @nickfrederiksen @cruiser Cloudflare and some other tools delays the request in the hope this will make them go away, iiutc. That is slowing the internet down for ALL of us.

          If there was a formal mechanism to tell them to stay away, it would be easier to place shame on those criminals.

          All those unwellcome requests, and the defence mechanisms, costs time, electricity etc, and that is a thread to the environment and costs money - not for the companies, rather for those that have brain enough to not want them in.

          nickfrederiksen@expressional.socialN 1 Reply Last reply
          0
          • anderslund@expressional.socialA anderslund@expressional.social

            @nickfrederiksen @cruiser Cloudflare and some other tools delays the request in the hope this will make them go away, iiutc. That is slowing the internet down for ALL of us.

            If there was a formal mechanism to tell them to stay away, it would be easier to place shame on those criminals.

            All those unwellcome requests, and the defence mechanisms, costs time, electricity etc, and that is a thread to the environment and costs money - not for the companies, rather for those that have brain enough to not want them in.

            nickfrederiksen@expressional.socialN This user is from outside of this forum
            nickfrederiksen@expressional.socialN This user is from outside of this forum
            nickfrederiksen@expressional.social
            wrote sidst redigeret af
            #8

            @anderslund @cruiser Agree!

            But think of the shareholders. Those poor, poor, shareholders.

            anderslund@expressional.socialA 1 Reply Last reply
            0
            • anderslund@expressional.socialA anderslund@expressional.social

              @leeloo In case any ai crawler owner with deacency exists, a DNS record would save us from A LOT of grief 🙂

              leeloo@chaosfem.twL This user is from outside of this forum
              leeloo@chaosfem.twL This user is from outside of this forum
              leeloo@chaosfem.tw
              wrote sidst redigeret af
              #9

              @anderslund
              How would it improve anything over using robots.txt?

              Also, someone writing a crawler knows how to use whatever library the are using to speak http. They may not know how to query a dns record. As far as I can see, robots.txt is easier for everybody.

              anderslund@expressional.socialA 1 Reply Last reply
              0
              • nickfrederiksen@expressional.socialN nickfrederiksen@expressional.social

                @anderslund @cruiser Agree!

                But think of the shareholders. Those poor, poor, shareholders.

                anderslund@expressional.socialA This user is from outside of this forum
                anderslund@expressional.socialA This user is from outside of this forum
                anderslund@expressional.social
                wrote sidst redigeret af
                #10

                @nickfrederiksen @cruiser Why is it that it is a common belief that owing shares means you do not need deacency? I do NOT understand that.

                nickfrederiksen@expressional.socialN 1 Reply Last reply
                0
                • leeloo@chaosfem.twL leeloo@chaosfem.tw

                  @anderslund
                  How would it improve anything over using robots.txt?

                  Also, someone writing a crawler knows how to use whatever library the are using to speak http. They may not know how to query a dns record. As far as I can see, robots.txt is easier for everybody.

                  anderslund@expressional.socialA This user is from outside of this forum
                  anderslund@expressional.socialA This user is from outside of this forum
                  anderslund@expressional.social
                  wrote sidst redigeret af
                  #11

                  @leeloo Relying on robots.txt means that the undesired traffic will happen, and place the costs on the website owners.

                  1 Reply Last reply
                  0
                  • anderslund@expressional.socialA anderslund@expressional.social

                    @nickfrederiksen @cruiser Why is it that it is a common belief that owing shares means you do not need deacency? I do NOT understand that.

                    nickfrederiksen@expressional.socialN This user is from outside of this forum
                    nickfrederiksen@expressional.socialN This user is from outside of this forum
                    nickfrederiksen@expressional.social
                    wrote sidst redigeret af
                    #12

                    @anderslund @cruiser Great question. I guess it's because most investors invest in shares for personal gain, rather than growth in the company.

                    I hardly see anyone investing in a company because they believe in the philosophy, the people, the product. People invest in shares that will give them a big payout in the end.

                    anderslund@expressional.socialA 1 Reply Last reply
                    0
                    • nickfrederiksen@expressional.socialN nickfrederiksen@expressional.social

                      @anderslund @cruiser Great question. I guess it's because most investors invest in shares for personal gain, rather than growth in the company.

                      I hardly see anyone investing in a company because they believe in the philosophy, the people, the product. People invest in shares that will give them a big payout in the end.

                      anderslund@expressional.socialA This user is from outside of this forum
                      anderslund@expressional.socialA This user is from outside of this forum
                      anderslund@expressional.social
                      wrote sidst redigeret af
                      #13

                      @nickfrederiksen @cruiser Even it you gain, spending money made with no decency have to kinda unnice?

                      1 Reply Last reply
                      0
                      Svar
                      • Svar som emne
                      Login for at svare
                      • Ældste til nyeste
                      • Nyeste til ældste
                      • Most Votes


                      • Log ind

                      • Har du ikke en konto? Tilmeld

                      • Login or register to search.
                      Powered by NodeBB Contributors
                      Graciously hosted by data.coop
                      • First post
                        Last post
                      0
                      • Hjem
                      • Seneste
                      • Etiketter
                      • Populære
                      • Verden
                      • Bruger
                      • Grupper