Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. Could we have a #DNS record for telling AI crawlers to stay away?

Could we have a #DNS record for telling AI crawlers to stay away?

Planlagt Fastgjort Låst Flyttet Ikke-kategoriseret
dnsnoaicriminalsthieves
13 Indlæg 4 Posters 0 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • anderslund@expressional.socialA anderslund@expressional.social

    Could we have a #DNS record for telling AI crawlers to stay away?

    #ai #noai #criminals #thieves

    cruiser@expressional.socialC This user is from outside of this forum
    cruiser@expressional.socialC This user is from outside of this forum
    cruiser@expressional.social
    wrote sidst redigeret af
    #2

    @anderslund Should be opt-in to allow them entering - abusing a sites content for own gain, such as copying data, is stealing, so we have been told 🫣 #piracy #stealing #ai #llm

    anderslund@expressional.socialA 1 Reply Last reply
    0
    • cruiser@expressional.socialC cruiser@expressional.social

      @anderslund Should be opt-in to allow them entering - abusing a sites content for own gain, such as copying data, is stealing, so we have been told 🫣 #piracy #stealing #ai #llm

      anderslund@expressional.socialA This user is from outside of this forum
      anderslund@expressional.socialA This user is from outside of this forum
      anderslund@expressional.social
      wrote sidst redigeret af anderslund@expressional.social
      #3

      @cruiser That is why a DNS record would be nice. As is, you have to fight them at the door.

      nickfrederiksen@expressional.socialN 1 Reply Last reply
      0
      • anderslund@expressional.socialA anderslund@expressional.social

        Could we have a #DNS record for telling AI crawlers to stay away?

        #ai #noai #criminals #thieves

        leeloo@chaosfem.twL This user is from outside of this forum
        leeloo@chaosfem.twL This user is from outside of this forum
        leeloo@chaosfem.tw
        wrote sidst redigeret af
        #4

        @anderslund
        No. We already have robots.txt for that, which they intentionally ignore, how would a DNS record be any different?

        anderslund@expressional.socialA 1 Reply Last reply
        0
        • leeloo@chaosfem.twL leeloo@chaosfem.tw

          @anderslund
          No. We already have robots.txt for that, which they intentionally ignore, how would a DNS record be any different?

          anderslund@expressional.socialA This user is from outside of this forum
          anderslund@expressional.socialA This user is from outside of this forum
          anderslund@expressional.social
          wrote sidst redigeret af
          #5

          @leeloo In case any ai crawler owner with deacency exists, a DNS record would save us from A LOT of grief 🙂

          leeloo@chaosfem.twL 1 Reply Last reply
          0
          • anderslund@expressional.socialA anderslund@expressional.social

            @cruiser That is why a DNS record would be nice. As is, you have to fight them at the door.

            nickfrederiksen@expressional.socialN This user is from outside of this forum
            nickfrederiksen@expressional.socialN This user is from outside of this forum
            nickfrederiksen@expressional.social
            wrote sidst redigeret af
            #6

            @anderslund @cruiser Most of the biggest crawlers, including LLMs, use a User Agent String, that can be filtered.

            There is also the robots.txt path as you would use for normal search engine crawlers.

            However. It turns out that both search engine crawlers, as well as LLM crawlers, changes their user agent strings from time to time making it difficult to block. And the robots.txt is more of a "recommendation" rather than actually something that they should comply to. Google indexes all pages, disregarding the robots.txt completely. It only flags pages, internally, with a "hidden"-flag.

            Cloudflare has a "block ai bots" feature, https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/, but its very much hit and miss. Since crawlers changes their user agent strings and IP adresses on a regular basis, these tools are not as reliable as one might think.

            anderslund@expressional.socialA 1 Reply Last reply
            0
            • nickfrederiksen@expressional.socialN nickfrederiksen@expressional.social

              @anderslund @cruiser Most of the biggest crawlers, including LLMs, use a User Agent String, that can be filtered.

              There is also the robots.txt path as you would use for normal search engine crawlers.

              However. It turns out that both search engine crawlers, as well as LLM crawlers, changes their user agent strings from time to time making it difficult to block. And the robots.txt is more of a "recommendation" rather than actually something that they should comply to. Google indexes all pages, disregarding the robots.txt completely. It only flags pages, internally, with a "hidden"-flag.

              Cloudflare has a "block ai bots" feature, https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/, but its very much hit and miss. Since crawlers changes their user agent strings and IP adresses on a regular basis, these tools are not as reliable as one might think.

              anderslund@expressional.socialA This user is from outside of this forum
              anderslund@expressional.socialA This user is from outside of this forum
              anderslund@expressional.social
              wrote sidst redigeret af
              #7

              @nickfrederiksen @cruiser Cloudflare and some other tools delays the request in the hope this will make them go away, iiutc. That is slowing the internet down for ALL of us.

              If there was a formal mechanism to tell them to stay away, it would be easier to place shame on those criminals.

              All those unwellcome requests, and the defence mechanisms, costs time, electricity etc, and that is a thread to the environment and costs money - not for the companies, rather for those that have brain enough to not want them in.

              nickfrederiksen@expressional.socialN 1 Reply Last reply
              0
              • anderslund@expressional.socialA anderslund@expressional.social

                @nickfrederiksen @cruiser Cloudflare and some other tools delays the request in the hope this will make them go away, iiutc. That is slowing the internet down for ALL of us.

                If there was a formal mechanism to tell them to stay away, it would be easier to place shame on those criminals.

                All those unwellcome requests, and the defence mechanisms, costs time, electricity etc, and that is a thread to the environment and costs money - not for the companies, rather for those that have brain enough to not want them in.

                nickfrederiksen@expressional.socialN This user is from outside of this forum
                nickfrederiksen@expressional.socialN This user is from outside of this forum
                nickfrederiksen@expressional.social
                wrote sidst redigeret af
                #8

                @anderslund @cruiser Agree!

                But think of the shareholders. Those poor, poor, shareholders.

                anderslund@expressional.socialA 1 Reply Last reply
                0
                • anderslund@expressional.socialA anderslund@expressional.social

                  @leeloo In case any ai crawler owner with deacency exists, a DNS record would save us from A LOT of grief 🙂

                  leeloo@chaosfem.twL This user is from outside of this forum
                  leeloo@chaosfem.twL This user is from outside of this forum
                  leeloo@chaosfem.tw
                  wrote sidst redigeret af
                  #9

                  @anderslund
                  How would it improve anything over using robots.txt?

                  Also, someone writing a crawler knows how to use whatever library the are using to speak http. They may not know how to query a dns record. As far as I can see, robots.txt is easier for everybody.

                  anderslund@expressional.socialA 1 Reply Last reply
                  0
                  • nickfrederiksen@expressional.socialN nickfrederiksen@expressional.social

                    @anderslund @cruiser Agree!

                    But think of the shareholders. Those poor, poor, shareholders.

                    anderslund@expressional.socialA This user is from outside of this forum
                    anderslund@expressional.socialA This user is from outside of this forum
                    anderslund@expressional.social
                    wrote sidst redigeret af
                    #10

                    @nickfrederiksen @cruiser Why is it that it is a common belief that owing shares means you do not need deacency? I do NOT understand that.

                    nickfrederiksen@expressional.socialN 1 Reply Last reply
                    0
                    • leeloo@chaosfem.twL leeloo@chaosfem.tw

                      @anderslund
                      How would it improve anything over using robots.txt?

                      Also, someone writing a crawler knows how to use whatever library the are using to speak http. They may not know how to query a dns record. As far as I can see, robots.txt is easier for everybody.

                      anderslund@expressional.socialA This user is from outside of this forum
                      anderslund@expressional.socialA This user is from outside of this forum
                      anderslund@expressional.social
                      wrote sidst redigeret af
                      #11

                      @leeloo Relying on robots.txt means that the undesired traffic will happen, and place the costs on the website owners.

                      1 Reply Last reply
                      0
                      • anderslund@expressional.socialA anderslund@expressional.social

                        @nickfrederiksen @cruiser Why is it that it is a common belief that owing shares means you do not need deacency? I do NOT understand that.

                        nickfrederiksen@expressional.socialN This user is from outside of this forum
                        nickfrederiksen@expressional.socialN This user is from outside of this forum
                        nickfrederiksen@expressional.social
                        wrote sidst redigeret af
                        #12

                        @anderslund @cruiser Great question. I guess it's because most investors invest in shares for personal gain, rather than growth in the company.

                        I hardly see anyone investing in a company because they believe in the philosophy, the people, the product. People invest in shares that will give them a big payout in the end.

                        anderslund@expressional.socialA 1 Reply Last reply
                        0
                        • nickfrederiksen@expressional.socialN nickfrederiksen@expressional.social

                          @anderslund @cruiser Great question. I guess it's because most investors invest in shares for personal gain, rather than growth in the company.

                          I hardly see anyone investing in a company because they believe in the philosophy, the people, the product. People invest in shares that will give them a big payout in the end.

                          anderslund@expressional.socialA This user is from outside of this forum
                          anderslund@expressional.socialA This user is from outside of this forum
                          anderslund@expressional.social
                          wrote sidst redigeret af
                          #13

                          @nickfrederiksen @cruiser Even it you gain, spending money made with no decency have to kinda unnice?

                          1 Reply Last reply
                          0
                          Svar
                          • Svar som emne
                          Login for at svare
                          • Ældste til nyeste
                          • Nyeste til ældste
                          • Most Votes


                          • Log ind

                          • Har du ikke en konto? Tilmeld

                          • Login or register to search.
                          Powered by NodeBB Contributors
                          Graciously hosted by data.coop
                          • First post
                            Last post
                          0
                          • Hjem
                          • Seneste
                          • Etiketter
                          • Populære
                          • Verden
                          • Bruger
                          • Grupper