Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. 👀 … https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993.

👀 … https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993.

Planlagt Fastgjort LĂĄst Flyttet Ikke-kategoriseret
llmopensource
310 Indlæg 57 Posters 0 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne trĂĄd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • cwebber@social.coopC cwebber@social.coop

    @richardfontana @bkuhn @ossguy Glad to hear we agree there!

    richardfontana@mastodon.socialR This user is from outside of this forum
    richardfontana@mastodon.socialR This user is from outside of this forum
    richardfontana@mastodon.social
    wrote sidst redigeret af
    #195

    @cwebber I mean, as a practical idea worth contemplating. Could imagine it as an experiment by someone with sufficient resources. There were some highly ill-conceived efforts to create anti-copyleft models a few years ago @bkuhn @ossguy

    1 Reply Last reply
    0
    • evan@cosocial.caE evan@cosocial.ca

      @cwebber

      Are you concerned that the LLMs generate nontrivial verbatim excerpts of copyrighted works?

      Or that there is a hidden "intellectual property" in the deep patterns that they use?

      Say, when an LLM was trained on a file I made with an interesting loop structure, and it emits code with a similar loop structure, even if the variable names, problem domain, details, or programming language differ.

      What if a court says I can demand royalties for my "IP"?

      @bkuhn @ossguy @richardfontana

      evan@cosocial.caE This user is from outside of this forum
      evan@cosocial.caE This user is from outside of this forum
      evan@cosocial.ca
      wrote sidst redigeret af
      #196

      @cwebber @bkuhn @ossguy @richardfontana

      Like, not copyrightable, not patents, but some secret third thing, kind of what people mean when we say that someone "copied our idea".

      cwebber@social.coopC 1 Reply Last reply
      0
      • evan@cosocial.caE evan@cosocial.ca

        @cwebber

        Are you concerned that the LLMs generate nontrivial verbatim excerpts of copyrighted works?

        Or that there is a hidden "intellectual property" in the deep patterns that they use?

        Say, when an LLM was trained on a file I made with an interesting loop structure, and it emits code with a similar loop structure, even if the variable names, problem domain, details, or programming language differ.

        What if a court says I can demand royalties for my "IP"?

        @bkuhn @ossguy @richardfontana

        cwebber@social.coopC This user is from outside of this forum
        cwebber@social.coopC This user is from outside of this forum
        cwebber@social.coop
        wrote sidst redigeret af
        #197

        @evan @richardfontana I am saying we don't know the answer to that question, and it seems that @bkuhn and @ossguy agree that we don't know the answer to it, based on previous posts, and the lack of knowledge about what the copyright implications of LLM based contributions means that we are creating a schrodingers-licensing-timebomb for our FOSS codebases

        evan@cosocial.caE bkuhn@fedi.copyleft.orgB 2 Replies Last reply
        0
        • evan@cosocial.caE evan@cosocial.ca

          @cwebber @bkuhn @ossguy @richardfontana

          Like, not copyrightable, not patents, but some secret third thing, kind of what people mean when we say that someone "copied our idea".

          cwebber@social.coopC This user is from outside of this forum
          cwebber@social.coopC This user is from outside of this forum
          cwebber@social.coop
          wrote sidst redigeret af
          #198

          @evan @bkuhn @ossguy @richardfontana I am talking about copyright

          evan@cosocial.caE cwebber@social.coopC 2 Replies Last reply
          0
          • cwebber@social.coopC cwebber@social.coop

            @evan @bkuhn @ossguy @richardfontana I am talking about copyright

            evan@cosocial.caE This user is from outside of this forum
            evan@cosocial.caE This user is from outside of this forum
            evan@cosocial.ca
            wrote sidst redigeret af
            #199

            @cwebber excellent, thanks!

            @bkuhn @ossguy @richardfontana

            1 Reply Last reply
            0
            • cwebber@social.coopC cwebber@social.coop

              @evan @bkuhn @ossguy @richardfontana I am talking about copyright

              cwebber@social.coopC This user is from outside of this forum
              cwebber@social.coopC This user is from outside of this forum
              cwebber@social.coop
              wrote sidst redigeret af
              #200

              @evan @bkuhn @ossguy @richardfontana Say for a moment that we *did* make a model which intentionally pulled in leaked source code from various proprietary codebases.

              What would your opinion be on the legal-hazard state of accepting that code output? Would you consider it relatively safe from a copyright perspective?

              bkuhn@fedi.copyleft.orgB 1 Reply Last reply
              0
              • cwebber@social.coopC cwebber@social.coop

                @bkuhn @ossguy @richardfontana Except, I actually believe this scenario isn't legally viable. And it's easier to understand if we scale back to the middle case.

                Let's now look at the LLM trained on CC0 and CC BY. Because it's the BY aspect that makes everything complicated.

                There is *NO WAY* in current LLM technology, nor I believe from studying how neural networks work, any viable computationally performant LLM, that they can track provenance. The BY clause cannot be upheld.

                This isn't a theoretical concern for me; someone built another vibecoded Scheme-to-WASM-GC compiler that looks an awful lot like Spritely's own Hoot compiler in places. They didn't attribute us. They probably didn't know. But like many FOSS licenses, Apache v2 does require certain levels of attribution to be upheld. Most FOSS projects do.

                You can't uphold the CC BY requirement, as far as I can tell.

                richardfontana@mastodon.socialR This user is from outside of this forum
                richardfontana@mastodon.socialR This user is from outside of this forum
                richardfontana@mastodon.social
                wrote sidst redigeret af
                #201

                @cwebber I think adequate compliance might be possible with good enough detection/matching tools but I don't necessarily expect such tools to be developed (let alone available to foss projects) (my assumption is that the few such tools in use today are pretty bad) @bkuhn @ossguy

                cwebber@social.coopC richardfontana@mastodon.socialR bkuhn@fedi.copyleft.orgB 3 Replies Last reply
                0
                • richardfontana@mastodon.socialR richardfontana@mastodon.social

                  @cwebber I think adequate compliance might be possible with good enough detection/matching tools but I don't necessarily expect such tools to be developed (let alone available to foss projects) (my assumption is that the few such tools in use today are pretty bad) @bkuhn @ossguy

                  cwebber@social.coopC This user is from outside of this forum
                  cwebber@social.coopC This user is from outside of this forum
                  cwebber@social.coop
                  wrote sidst redigeret af
                  #202

                  @richardfontana @bkuhn @ossguy That's a problem so hard it throws the "NP complete" debate out the window in favor of something brand new. Given that these codebases have no trouble "translating" from one language's source code into another, how on *earth* could you possibly hope to build a compliance tool around that?

                  Laughable, to anyone who tries.

                  evan@cosocial.caE 1 Reply Last reply
                  0
                  • richardfontana@mastodon.socialR richardfontana@mastodon.social

                    @cwebber I think adequate compliance might be possible with good enough detection/matching tools but I don't necessarily expect such tools to be developed (let alone available to foss projects) (my assumption is that the few such tools in use today are pretty bad) @bkuhn @ossguy

                    richardfontana@mastodon.socialR This user is from outside of this forum
                    richardfontana@mastodon.socialR This user is from outside of this forum
                    richardfontana@mastodon.social
                    wrote sidst redigeret af
                    #203

                    @cwebber to be clear compliance cannot somehow be built in to the LLM for reasons you stated, but ancillary tools for LLM users to reconstruct provenance exist and conceivably could be made more useful @bkuhn @ossguy

                    cwebber@social.coopC 1 Reply Last reply
                    0
                    • cwebber@social.coopC cwebber@social.coop

                      @bkuhn @ossguy @richardfontana So the question is: is it safe, from a legal perspective, given the current state of uncertainty of copyright of such contributions, to encourage accepting such contributions into repositories?

                      Now clearly, many projects are: the Linux kernel most famously is, and their recent policy document says effectively, "You can contribute AI generated code, but the onus is on you whether or not you legally could have".

                      Which is not very helpful of a handwave, I would say, since few contributors are equipped to assess such a thing. I've left myself and three others addressed in this portion of the thread, and all of us *have* done licensing work, and my suspicion is, *especially* based on what's been written, that none of us could confidently project where things are going to go.

                      zacchiro@mastodon.xyzZ This user is from outside of this forum
                      zacchiro@mastodon.xyzZ This user is from outside of this forum
                      zacchiro@mastodon.xyz
                      wrote sidst redigeret af
                      #204

                      @cwebber @bkuhn @ossguy @richardfontana

                      My current answer to your "is it safe" question is to answer a slightly different question. Namely: "is it any less safe than accepting code from a random employee that claims to be submitting under a inbound=outbound regime, whereas in fact they cannot?". The latter we have been doing for decades, with limited damages to the commons.

                      (I *also* think the legal odds are more in our favor with AI-assisted contributions than in the previous case.)

                      cwebber@social.coopC ? 2 Replies Last reply
                      0
                      • zacchiro@mastodon.xyzZ zacchiro@mastodon.xyz

                        @cwebber @bkuhn @ossguy @richardfontana

                        My current answer to your "is it safe" question is to answer a slightly different question. Namely: "is it any less safe than accepting code from a random employee that claims to be submitting under a inbound=outbound regime, whereas in fact they cannot?". The latter we have been doing for decades, with limited damages to the commons.

                        (I *also* think the legal odds are more in our favor with AI-assisted contributions than in the previous case.)

                        cwebber@social.coopC This user is from outside of this forum
                        cwebber@social.coopC This user is from outside of this forum
                        cwebber@social.coop
                        wrote sidst redigeret af
                        #205

                        @zacchiro @bkuhn @ossguy @richardfontana While true, there is a big difference in that the previous scenario was someone out of compliance with what the community actually accepted as hygienic and acceptable contributions, and those contributions were relatively rare.

                        Saying that we don't need to worry about the risks from these tools right now from a licensing situation is different: it's advising on a path being acceptable where we *don't know* whether or not it's generally safe practice to recommend! And which most in this thread seem to agree we don't know. Even your post seems to say "it seems like it'll probably be okay and end up in our favor".

                        I guess I feel increasingly like I am maybe the only "oldschool FOSS licensing wonk" who cares about this, and maybe that means I should just give up.

                        But *damn* I can't believe it feels like when people are both saying "we don't know what the implications will be" we're also saying "so go ahead and say those patches are a-ok!"

                        1 Reply Last reply
                        0
                        • richardfontana@mastodon.socialR richardfontana@mastodon.social

                          @cwebber to be clear compliance cannot somehow be built in to the LLM for reasons you stated, but ancillary tools for LLM users to reconstruct provenance exist and conceivably could be made more useful @bkuhn @ossguy

                          cwebber@social.coopC This user is from outside of this forum
                          cwebber@social.coopC This user is from outside of this forum
                          cwebber@social.coop
                          wrote sidst redigeret af
                          #206

                          @richardfontana As said here, given the "translation between languages" aspect, I can't really see that as likely to be true https://social.coop/@cwebber/116426770262334234

                          Which maybe that means that all this stuff really is public domain, a position I am *fully willing to accept*! But I don't think it's known (especially internationally), and I don't think @bkuhn or @ossguy are eager to adopt that perspective

                          1 Reply Last reply
                          0
                          • cwebber@social.coopC cwebber@social.coop

                            @evan @richardfontana I am saying we don't know the answer to that question, and it seems that @bkuhn and @ossguy agree that we don't know the answer to it, based on previous posts, and the lack of knowledge about what the copyright implications of LLM based contributions means that we are creating a schrodingers-licensing-timebomb for our FOSS codebases

                            evan@cosocial.caE This user is from outside of this forum
                            evan@cosocial.caE This user is from outside of this forum
                            evan@cosocial.ca
                            wrote sidst redigeret af
                            #207

                            @cwebber

                            This is probably a healthy concern.

                            I think there might be some good ways to hedge one's bets, though.

                            Use LLMs for rubber ducking, code scanning and review, rather than code generation.

                            Keep LLM code contributions minimal and unremarkable, too.

                            Don't make them load-bearing. If the code is central to the program, it's too unique.

                            @richardfontana @bkuhn @ossguy

                            cwebber@social.coopC triptych@social.lolT fay@lingo.lolF evan@cosocial.caE 4 Replies Last reply
                            0
                            • evan@cosocial.caE evan@cosocial.ca

                              @cwebber

                              This is probably a healthy concern.

                              I think there might be some good ways to hedge one's bets, though.

                              Use LLMs for rubber ducking, code scanning and review, rather than code generation.

                              Keep LLM code contributions minimal and unremarkable, too.

                              Don't make them load-bearing. If the code is central to the program, it's too unique.

                              @richardfontana @bkuhn @ossguy

                              cwebber@social.coopC This user is from outside of this forum
                              cwebber@social.coopC This user is from outside of this forum
                              cwebber@social.coop
                              wrote sidst redigeret af
                              #208

                              @evan @richardfontana @bkuhn @ossguy Yeah! I actually already said elsewhere in the thread I don't think we need to worry about using these tools for such scenarios from a *licensing* perspective, only when the genAI is explicitly checked into the codebase

                              evan@cosocial.caE 1 Reply Last reply
                              0
                              • evan@cosocial.caE evan@cosocial.ca

                                @cwebber

                                This is probably a healthy concern.

                                I think there might be some good ways to hedge one's bets, though.

                                Use LLMs for rubber ducking, code scanning and review, rather than code generation.

                                Keep LLM code contributions minimal and unremarkable, too.

                                Don't make them load-bearing. If the code is central to the program, it's too unique.

                                @richardfontana @bkuhn @ossguy

                                triptych@social.lolT This user is from outside of this forum
                                triptych@social.lolT This user is from outside of this forum
                                triptych@social.lol
                                wrote sidst redigeret af
                                #209

                                @evan @cwebber @richardfontana @bkuhn @ossguy this is wisdom

                                1 Reply Last reply
                                0
                                • zacchiro@mastodon.xyzZ zacchiro@mastodon.xyz

                                  @cwebber @bkuhn @ossguy @richardfontana

                                  My current answer to your "is it safe" question is to answer a slightly different question. Namely: "is it any less safe than accepting code from a random employee that claims to be submitting under a inbound=outbound regime, whereas in fact they cannot?". The latter we have been doing for decades, with limited damages to the commons.

                                  (I *also* think the legal odds are more in our favor with AI-assisted contributions than in the previous case.)

                                  ? Offline
                                  ? Offline
                                  Gæst
                                  wrote sidst redigeret af
                                  #210

                                  @zacchiro @cwebber @bkuhn @ossguy @richardfontana I would say it's dramatically less safe. First, there's very little incentive to go after some OSS project over an unauthorized inbound=outbound contribution. Second, if someone did, the damage would likely be a small part of a single project. Third, only a small number of parties (the employer, or maybe some other single party whose code was copied) have the ability to sue.

                                  With LLMs, it's different. When the authors sued Anthropic, they all sued. Is a shell script that Claude generated a derivative work of, say, the romantasy novel A Court of Thorns and Roses (to pick a random thing included in Anthropic's training set)? Well, it's hard to show that it's not, in the sense that that novel is one of the zillion things that went into generating the weights that generated the shell script.

                                  Now it happens that the authors sued Anthropic (and settled). But I don't know if their settlement covers users of Claude (and even if it did, there are two other big models). And that's only the book authors -- there's still all of the code authors in the world.

                                  So yes, I think the risk is high. I mean, in some sense -- in another sense, it seems unlikely that Congress would say, "sorry, LLMs as code generators are toast because of some century-old laws". At most, they would set up a statutory licensing scheme for LLM providers which covers LLM outputs. Of course, Europe might go a different way, but I think they would probably do the same. Under this hypothetical scheme, if your code were used to train Claude, you would get a buck or two in the mail every year. Authors got I think $3k per book as a one-time payment, but that was a funny case because of how Anthropic got access to the books.

                                  Still, there's a risk that Congress wouldn't act (due to standard US government dysfunction).

                                  It seems like most people are willing to take this risk, which I think says something interesting about most people's moral intuitions.

                                  bkuhn@fedi.copyleft.orgB 1 Reply Last reply
                                  0
                                  • evan@cosocial.caE evan@cosocial.ca

                                    @cwebber

                                    This is probably a healthy concern.

                                    I think there might be some good ways to hedge one's bets, though.

                                    Use LLMs for rubber ducking, code scanning and review, rather than code generation.

                                    Keep LLM code contributions minimal and unremarkable, too.

                                    Don't make them load-bearing. If the code is central to the program, it's too unique.

                                    @richardfontana @bkuhn @ossguy

                                    fay@lingo.lolF This user is from outside of this forum
                                    fay@lingo.lolF This user is from outside of this forum
                                    fay@lingo.lol
                                    wrote sidst redigeret af
                                    #211

                                    @evan
                                    @cwebber @richardfontana @bkuhn @ossguy or just... not at all

                                    1 Reply Last reply
                                    0
                                    • evan@cosocial.caE evan@cosocial.ca

                                      @cwebber

                                      This is probably a healthy concern.

                                      I think there might be some good ways to hedge one's bets, though.

                                      Use LLMs for rubber ducking, code scanning and review, rather than code generation.

                                      Keep LLM code contributions minimal and unremarkable, too.

                                      Don't make them load-bearing. If the code is central to the program, it's too unique.

                                      @richardfontana @bkuhn @ossguy

                                      evan@cosocial.caE This user is from outside of this forum
                                      evan@cosocial.caE This user is from outside of this forum
                                      evan@cosocial.ca
                                      wrote sidst redigeret af
                                      #212

                                      I think the worst case scenario is that the inserted code matches exactly one snippet in the training data.

                                      So you could try to go for zero matches, by using such idiosyncratic and unrecommended coding conventions that nobody else has code like yours.

                                      Or you could try to go for lots of matches, by using bog standard coding conventions and software patterns.

                                      @cwebber @richardfontana @bkuhn @ossguy

                                      evan@cosocial.caE 1 Reply Last reply
                                      0
                                      • cwebber@social.coopC cwebber@social.coop

                                        @evan @richardfontana @bkuhn @ossguy Yeah! I actually already said elsewhere in the thread I don't think we need to worry about using these tools for such scenarios from a *licensing* perspective, only when the genAI is explicitly checked into the codebase

                                        evan@cosocial.caE This user is from outside of this forum
                                        evan@cosocial.caE This user is from outside of this forum
                                        evan@cosocial.ca
                                        wrote sidst redigeret af
                                        #213

                                        @cwebber the weights themselves?

                                        @richardfontana @bkuhn @ossguy

                                        cwebber@social.coopC 1 Reply Last reply
                                        0
                                        • evan@cosocial.caE evan@cosocial.ca

                                          @cwebber the weights themselves?

                                          @richardfontana @bkuhn @ossguy

                                          cwebber@social.coopC This user is from outside of this forum
                                          cwebber@social.coopC This user is from outside of this forum
                                          cwebber@social.coop
                                          wrote sidst redigeret af
                                          #214

                                          @evan @richardfontana @bkuhn @ossguy Sorry, I missed a word when I edited the sentence, I meant "genAI output"

                                          evan@cosocial.caE 1 Reply Last reply
                                          0
                                          Svar
                                          • Svar som emne
                                          Login for at svare
                                          • Ældste til nyeste
                                          • Nyeste til ældste
                                          • Most Votes


                                          • Log ind

                                          • Har du ikke en konto? Tilmeld

                                          • Login or register to search.
                                          Powered by NodeBB Contributors
                                          Graciously hosted by data.coop
                                          • First post
                                            Last post
                                          0
                                          • Hjem
                                          • Seneste
                                          • Etiketter
                                          • Populære
                                          • Verden
                                          • Bruger
                                          • Grupper