Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. Google Search rests on a social contract: their bots can crawl our sites, they can index our sites, and they can show excerpts of our sites because

Google Search rests on a social contract: their bots can crawl our sites, they can index our sites, and they can show excerpts of our sites because

Planlagt Fastgjort Låst Flyttet Ikke-kategoriseret
103 Indlæg 66 Posters 0 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • inthehands@hachyderm.ioI inthehands@hachyderm.io

    RE: https://tldr.nettime.org/@tante/116605858023186072

    Google Search rests on a social contract: their bots can crawl our sites, they can index our sites, and they can show excerpts of our sites because

    and •only because•

    they send people to our sites. •Our• sites, our words, with our design, with our links, with our context and our aesthetics, shared the way we want to share them.

    Google is announcing — unambiguously and with great fanfare — that they are now fully breaking that already-ragged contract. We should reciprocate.

    1/2

    datarama@hachyderm.ioD This user is from outside of this forum
    datarama@hachyderm.ioD This user is from outside of this forum
    datarama@hachyderm.io
    wrote sidst redigeret af
    #2

    @inthehands As I said just a while ago: Every big tech press event these last few years have felt like "Announcing our exciting plans for oligarchs to strip-mine the entire world and immiserate all of humanity! Get on board, and also death to the unbelievers!"

    npars01@mstdn.socialN 1 Reply Last reply
    0
    • inthehands@hachyderm.ioI inthehands@hachyderm.io

      RE: https://tldr.nettime.org/@tante/116605858023186072

      Google Search rests on a social contract: their bots can crawl our sites, they can index our sites, and they can show excerpts of our sites because

      and •only because•

      they send people to our sites. •Our• sites, our words, with our design, with our links, with our context and our aesthetics, shared the way we want to share them.

      Google is announcing — unambiguously and with great fanfare — that they are now fully breaking that already-ragged contract. We should reciprocate.

      1/2

      inthehands@hachyderm.ioI This user is from outside of this forum
      inthehands@hachyderm.ioI This user is from outside of this forum
      inthehands@hachyderm.io
      wrote sidst redigeret af
      #3

      Quick strategy discussion, for those who understand Google indexing and SEO:

      If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

      The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

      2/2

      inthehands@hachyderm.ioI adamshostack@infosec.exchangeA joe@f.duriansoftware.comJ S elexia@twoot.siteE 15 Replies Last reply
      0
      • inthehands@hachyderm.ioI inthehands@hachyderm.io

        Quick strategy discussion, for those who understand Google indexing and SEO:

        If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

        The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

        2/2

        inthehands@hachyderm.ioI This user is from outside of this forum
        inthehands@hachyderm.ioI This user is from outside of this forum
        inthehands@hachyderm.io
        wrote sidst redigeret af
        #4

        Same question as the previous post, except for Wkipedia. What would you like to see them do to send a shot across the bow?

        Or…well, it’s Wikipedia. Maybe more like a shot to the hull.

        3/2

        inthehands@hachyderm.ioI 1 Reply Last reply
        0
        • inthehands@hachyderm.ioI inthehands@hachyderm.io

          Quick strategy discussion, for those who understand Google indexing and SEO:

          If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

          The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

          2/2

          adamshostack@infosec.exchangeA This user is from outside of this forum
          adamshostack@infosec.exchangeA This user is from outside of this forum
          adamshostack@infosec.exchange
          wrote sidst redigeret af
          #5

          @inthehands (3) sue on the basis that’s it’s not fair use, and these derivative works clearly have a dramatic impact on the value of the original site

          inthehands@hachyderm.ioI 1 Reply Last reply
          0
          • inthehands@hachyderm.ioI inthehands@hachyderm.io

            Quick strategy discussion, for those who understand Google indexing and SEO:

            If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

            The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

            2/2

            joe@f.duriansoftware.comJ This user is from outside of this forum
            joe@f.duriansoftware.comJ This user is from outside of this forum
            joe@f.duriansoftware.com
            wrote sidst redigeret af
            #6

            @inthehands is "serve LLM poison to googlebot user-agents" on the table

            inthehands@hachyderm.ioI 1 Reply Last reply
            0
            • adamshostack@infosec.exchangeA adamshostack@infosec.exchange

              @inthehands (3) sue on the basis that’s it’s not fair use, and these derivative works clearly have a dramatic impact on the value of the original site

              inthehands@hachyderm.ioI This user is from outside of this forum
              inthehands@hachyderm.ioI This user is from outside of this forum
              inthehands@hachyderm.io
              wrote sidst redigeret af
              #7

              @adamshostack

              This is clearly how copyright law as written •should• work. Not sure if it’s how it •does• work, but if anybody’s trying, they have my sword.

              pixx@merveilles.townP sennoma@chaos.socialS ferrix@mastodon.onlineF 3 Replies Last reply
              0
              • joe@f.duriansoftware.comJ joe@f.duriansoftware.com

                @inthehands is "serve LLM poison to googlebot user-agents" on the table

                inthehands@hachyderm.ioI This user is from outside of this forum
                inthehands@hachyderm.ioI This user is from outside of this forum
                inthehands@hachyderm.io
                wrote sidst redigeret af
                #8

                @joe
                It is and some of us miiiiight already be doing it.

                joe@f.duriansoftware.comJ groupnebula563@mastodon.socialG 2 Replies Last reply
                0
                • inthehands@hachyderm.ioI inthehands@hachyderm.io

                  Quick strategy discussion, for those who understand Google indexing and SEO:

                  If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

                  The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

                  2/2

                  S This user is from outside of this forum
                  S This user is from outside of this forum
                  shadsterling@mastodon.social
                  wrote sidst redigeret af
                  #9

                  @inthehands both, and, when the agent matches the heuristics to be recognized as Google (et. al.), send a different response that contains only an explanation of the ban (and maybe some poison for their next model)

                  1 Reply Last reply
                  0
                  • inthehands@hachyderm.ioI inthehands@hachyderm.io

                    RE: https://tldr.nettime.org/@tante/116605858023186072

                    Google Search rests on a social contract: their bots can crawl our sites, they can index our sites, and they can show excerpts of our sites because

                    and •only because•

                    they send people to our sites. •Our• sites, our words, with our design, with our links, with our context and our aesthetics, shared the way we want to share them.

                    Google is announcing — unambiguously and with great fanfare — that they are now fully breaking that already-ragged contract. We should reciprocate.

                    1/2

                    mjd@mathstodon.xyzM This user is from outside of this forum
                    mjd@mathstodon.xyzM This user is from outside of this forum
                    mjd@mathstodon.xyz
                    wrote sidst redigeret af
                    #10

                    @cceckman The contract I thought I was signing was this: I published my stuff on a worldwide information network, with no controls whatever, specifically so that anyone anywhere could access it. I did that with full understanding that it would enable people I might not like to read, copy, and share it and put it to uses that I couldn't foresee and might not approve of. And if I didn't want to entertain that possibility I should not have installed a program on my computer whose sole purpose was to deliver of my stuff to any rando who asked for it.

                    I'm not saying I got a good deal, or that I'm happy with the outcome. But I'm not going to pretend I was tricked or that Google reneged on a bargain. We had no bargain. I served them the stuff anyway, whenever they asked for it.

                    And I'm not sure I believe Paul Cantrell when he says he thought the contract was different from what I said.

                    wronglang@bayes.clubW cceckman@hachyderm.ioC donaldball@triangletoot.partyD williamoconnell@mas.toW theothersimo@mastodon.socialT 6 Replies Last reply
                    0
                    • mjd@mathstodon.xyzM mjd@mathstodon.xyz

                      @cceckman The contract I thought I was signing was this: I published my stuff on a worldwide information network, with no controls whatever, specifically so that anyone anywhere could access it. I did that with full understanding that it would enable people I might not like to read, copy, and share it and put it to uses that I couldn't foresee and might not approve of. And if I didn't want to entertain that possibility I should not have installed a program on my computer whose sole purpose was to deliver of my stuff to any rando who asked for it.

                      I'm not saying I got a good deal, or that I'm happy with the outcome. But I'm not going to pretend I was tricked or that Google reneged on a bargain. We had no bargain. I served them the stuff anyway, whenever they asked for it.

                      And I'm not sure I believe Paul Cantrell when he says he thought the contract was different from what I said.

                      wronglang@bayes.clubW This user is from outside of this forum
                      wronglang@bayes.clubW This user is from outside of this forum
                      wronglang@bayes.club
                      wrote sidst redigeret af
                      #11

                      @mjd @cceckman he's talking about a social contract

                      S 1 Reply Last reply
                      0
                      • inthehands@hachyderm.ioI inthehands@hachyderm.io

                        Same question as the previous post, except for Wkipedia. What would you like to see them do to send a shot across the bow?

                        Or…well, it’s Wikipedia. Maybe more like a shot to the hull.

                        3/2

                        inthehands@hachyderm.ioI This user is from outside of this forum
                        inthehands@hachyderm.ioI This user is from outside of this forum
                        inthehands@hachyderm.io
                        wrote sidst redigeret af
                        #12

                        Going with meta noindex for now. My thinking is that this actively tells Google to yank already-crawled content from their index, whereas they might take a robots.txt entry to mean “do not update, but keep showing last fetched.”

                        shadowjonathan@tech.lgbtS lunaphied@provably.onlineL qurlyjoe@mstdn.socialQ inthehands@hachyderm.ioI korrupt@nrw.socialK 5 Replies Last reply
                        0
                        • inthehands@hachyderm.ioI inthehands@hachyderm.io

                          Quick strategy discussion, for those who understand Google indexing and SEO:

                          If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

                          The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

                          2/2

                          elexia@twoot.siteE This user is from outside of this forum
                          elexia@twoot.siteE This user is from outside of this forum
                          elexia@twoot.site
                          wrote sidst redigeret af
                          #13

                          @inthehands if they decide that people doing this hurts their business model they will simply stop respecting things like robots.txt. their gamble is that people rely on Google more than they do on other websites and if they have to kill the rest of the web to monopolize access to information they will.

                          inthehands@hachyderm.ioI 1 Reply Last reply
                          1
                          0
                          • inthehands@hachyderm.ioI inthehands@hachyderm.io

                            Quick strategy discussion, for those who understand Google indexing and SEO:

                            If I want to yank a web site out of Google’s now-fully-extractive search, should I (1) disallow googlebot in robots.txt or (2) add `<meta name="googlebot" content="noindex">` to all the page headers?

                            The goal here is not just to remove my contributions to the commons from Google’s results, but to •make Google aware• that sites are pulling consent. What will best do that?

                            2/2

                            maddiem4@raphus.socialM This user is from outside of this forum
                            maddiem4@raphus.socialM This user is from outside of this forum
                            maddiem4@raphus.social
                            wrote sidst redigeret af
                            #14

                            @inthehands while this can (and probably should) be done in tandem with other strategies, one of the most unambiguous ways you can express your disdain is in robots.txt. Google has historically respected it mechanically (in the present and future, I'm not sure this will hold), and it supports line comments with # so you can explain in plain English what you think about them.

                            https://developers.google.com/search/docs/crawling-indexing/robots/intro

                            The docs also mention the 'noindex' meta tag and how you probably want to use one or the other but not both. That's worth a little research, probably.

                            1 Reply Last reply
                            0
                            • elexia@twoot.siteE elexia@twoot.site

                              @inthehands if they decide that people doing this hurts their business model they will simply stop respecting things like robots.txt. their gamble is that people rely on Google more than they do on other websites and if they have to kill the rest of the web to monopolize access to information they will.

                              inthehands@hachyderm.ioI This user is from outside of this forum
                              inthehands@hachyderm.ioI This user is from outside of this forum
                              inthehands@hachyderm.io
                              wrote sidst redigeret af
                              #15

                              @elexia

                              Of course, but it is important to force that fight rather than capitulating in advance.

                              1 Reply Last reply
                              0
                              • inthehands@hachyderm.ioI inthehands@hachyderm.io

                                @joe
                                It is and some of us miiiiight already be doing it.

                                joe@f.duriansoftware.comJ This user is from outside of this forum
                                joe@f.duriansoftware.comJ This user is from outside of this forum
                                joe@f.duriansoftware.com
                                wrote sidst redigeret af
                                #16

                                @inthehands given how eager their summarizer is to incorporate "facts" from even unintentionally adversarial recent posts like satirical blogs, it seems like it wouldn't take much of a coordinated effort to reduce their result quality this way

                                S 1 Reply Last reply
                                0
                                • wronglang@bayes.clubW wronglang@bayes.club

                                  @mjd @cceckman he's talking about a social contract

                                  S This user is from outside of this forum
                                  S This user is from outside of this forum
                                  shadsterling@mastodon.social
                                  wrote sidst redigeret af
                                  #17

                                  @wronglang @mjd @cceckman this sort of discrepancy is why I’ve never liked the term “social contract” - it’s nothing like a “contract”

                                  wronglang@bayes.clubW 1 Reply Last reply
                                  0
                                  • joe@f.duriansoftware.comJ joe@f.duriansoftware.com

                                    @inthehands given how eager their summarizer is to incorporate "facts" from even unintentionally adversarial recent posts like satirical blogs, it seems like it wouldn't take much of a coordinated effort to reduce their result quality this way

                                    S This user is from outside of this forum
                                    S This user is from outside of this forum
                                    shadsterling@mastodon.social
                                    wrote sidst redigeret af
                                    #18

                                    @joe @inthehands is there a coordinated effort that has a website? And/or server plugins that automate serving coordinated poison?

                                    joe@f.duriansoftware.comJ 1 Reply Last reply
                                    0
                                    • inthehands@hachyderm.ioI inthehands@hachyderm.io

                                      Going with meta noindex for now. My thinking is that this actively tells Google to yank already-crawled content from their index, whereas they might take a robots.txt entry to mean “do not update, but keep showing last fetched.”

                                      shadowjonathan@tech.lgbtS This user is from outside of this forum
                                      shadowjonathan@tech.lgbtS This user is from outside of this forum
                                      shadowjonathan@tech.lgbt
                                      wrote sidst redigeret af
                                      #19

                                      @inthehands this is a fence-post defense against this, google Will Not Care

                                      just start poisoning the data once you detect that google is the one fetching it, just absolutely fucking destroy their LLM output

                                      114@tech.lgbt1 1 Reply Last reply
                                      0
                                      • inthehands@hachyderm.ioI inthehands@hachyderm.io

                                        Going with meta noindex for now. My thinking is that this actively tells Google to yank already-crawled content from their index, whereas they might take a robots.txt entry to mean “do not update, but keep showing last fetched.”

                                        lunaphied@provably.onlineL This user is from outside of this forum
                                        lunaphied@provably.onlineL This user is from outside of this forum
                                        lunaphied@provably.online
                                        wrote sidst redigeret af
                                        #20

                                        @inthehands also probably worth it to submit a pagemaster/webmaster request to them to directly tell them to deindex your site. Also DMCA takedowns to Google are usually effective. If you're in the jurisdiction of Australia you're potentially able to go after them iirc. (The Australian government went after them for embedding news articles in their output or something)

                                        1 Reply Last reply
                                        0
                                        • inthehands@hachyderm.ioI inthehands@hachyderm.io

                                          Going with meta noindex for now. My thinking is that this actively tells Google to yank already-crawled content from their index, whereas they might take a robots.txt entry to mean “do not update, but keep showing last fetched.”

                                          qurlyjoe@mstdn.socialQ This user is from outside of this forum
                                          qurlyjoe@mstdn.socialQ This user is from outside of this forum
                                          qurlyjoe@mstdn.social
                                          wrote sidst redigeret af
                                          #21

                                          @inthehands
                                          What guarantee does one have that Google will abide by these restrictions?

                                          1 Reply Last reply
                                          0
                                          Svar
                                          • Svar som emne
                                          Login for at svare
                                          • Ældste til nyeste
                                          • Nyeste til ældste
                                          • Most Votes


                                          • Log ind

                                          • Har du ikke en konto? Tilmeld

                                          • Login or register to search.
                                          Powered by NodeBB Contributors
                                          Graciously hosted by data.coop
                                          • First post
                                            Last post
                                          0
                                          • Hjem
                                          • Seneste
                                          • Etiketter
                                          • Populære
                                          • Verden
                                          • Bruger
                                          • Grupper