Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. 👀 … https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993.

👀 … https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993.

Planlagt Fastgjort LÃ¥st Flyttet Ikke-kategoriseret
llmopensource
310 Indlæg 57 Posters 0 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • evan@cosocial.caE evan@cosocial.ca

    @sfoskett you can incorporate public domain code into a licensed work.

    @cwebber @bkuhn @ossguy @richardfontana

    sfoskett@techfieldday.netS This user is from outside of this forum
    sfoskett@techfieldday.netS This user is from outside of this forum
    sfoskett@techfieldday.net
    wrote sidst redigeret af
    #231

    @evan @cwebber @bkuhn @ossguy @richardfontana Ok I haven’t really heard people before you guys explain that to me. So I was wondering if it was possible that it couldn’t be licensed. Thanks.

    1 Reply Last reply
    0
    • richardfontana@mastodon.socialR richardfontana@mastodon.social

      @evan I feel pretty confident in saying the abstraction-filtration-comparison test cannot possibly be automated @cwebber @bkuhn @ossguy

      evan@cosocial.caE This user is from outside of this forum
      evan@cosocial.caE This user is from outside of this forum
      evan@cosocial.ca
      wrote sidst redigeret af
      #232

      @richardfontana @cwebber @bkuhn @ossguy Yeah, I thought my job couldn't be automated, either, and yet here we are.

      evan@cosocial.caE bkuhn@fedi.copyleft.orgB 2 Replies Last reply
      0
      • evan@cosocial.caE evan@cosocial.ca

        @richardfontana @cwebber @bkuhn @ossguy Yeah, I thought my job couldn't be automated, either, and yet here we are.

        evan@cosocial.caE This user is from outside of this forum
        evan@cosocial.caE This user is from outside of this forum
        evan@cosocial.ca
        wrote sidst redigeret af
        #233

        @richardfontana @cwebber @bkuhn @ossguy Seriously, though, a lot of the work seems like it is tractable to LLM automation?

        Like, the abstraction part seems like it's just summarizing components at the function, module, and program level. This is the command-line argument parser, this is the database abstraction layer, this is the logging module. LLMs are pretty good at this!

        richardfontana@mastodon.socialR evan@cosocial.caE 2 Replies Last reply
        0
        • evan@cosocial.caE evan@cosocial.ca

          @richardfontana @cwebber @bkuhn @ossguy Seriously, though, a lot of the work seems like it is tractable to LLM automation?

          Like, the abstraction part seems like it's just summarizing components at the function, module, and program level. This is the command-line argument parser, this is the database abstraction layer, this is the logging module. LLMs are pretty good at this!

          richardfontana@mastodon.socialR This user is from outside of this forum
          richardfontana@mastodon.socialR This user is from outside of this forum
          richardfontana@mastodon.social
          wrote sidst redigeret af
          #234

          @evan oh I mean of course you could use LLMs to help with the analysis @cwebber @bkuhn @ossguy

          bkuhn@fedi.copyleft.orgB 1 Reply Last reply
          0
          • ossguy@fedi.copyleft.orgO ossguy@fedi.copyleft.org

            @cwebber @LordCaramac @bkuhn @richardfontana Sadly it will be years before we have an answer re copyright and we can't wait for that. Outlining usage in the meantime is the best we can do, in case we need to do something with that later.

            We know proprietary software companies are using these tools extensively, so this is in effect a mutually assured destruction situation. While we wait, we should make sure that we are pushing freedom on all other axes, since they won't do that part.

            wwahammy@social.treehouse.systemsW This user is from outside of this forum
            wwahammy@social.treehouse.systemsW This user is from outside of this forum
            wwahammy@social.treehouse.systems
            wrote sidst redigeret af
            #235

            @ossguy @cwebber @LordCaramac @bkuhn @richardfontana proprietary software companies extensively use GitHub and yet SFC's position is "don't use GitHub".

            There are so many things we do in free software and in the interactions with SFC and FSF that would be simpler if we used proprietary software. How many janky experiences have people been asking to tolerate to participate? Why shouldn't we use proprietary software there?

            bkuhn@fedi.copyleft.orgB 1 Reply Last reply
            0
            • ? Gæst

              @zacchiro @cwebber @bkuhn @ossguy @richardfontana I would say it's dramatically less safe. First, there's very little incentive to go after some OSS project over an unauthorized inbound=outbound contribution. Second, if someone did, the damage would likely be a small part of a single project. Third, only a small number of parties (the employer, or maybe some other single party whose code was copied) have the ability to sue.

              With LLMs, it's different. When the authors sued Anthropic, they all sued. Is a shell script that Claude generated a derivative work of, say, the romantasy novel A Court of Thorns and Roses (to pick a random thing included in Anthropic's training set)? Well, it's hard to show that it's not, in the sense that that novel is one of the zillion things that went into generating the weights that generated the shell script.

              Now it happens that the authors sued Anthropic (and settled). But I don't know if their settlement covers users of Claude (and even if it did, there are two other big models). And that's only the book authors -- there's still all of the code authors in the world.

              So yes, I think the risk is high. I mean, in some sense -- in another sense, it seems unlikely that Congress would say, "sorry, LLMs as code generators are toast because of some century-old laws". At most, they would set up a statutory licensing scheme for LLM providers which covers LLM outputs. Of course, Europe might go a different way, but I think they would probably do the same. Under this hypothetical scheme, if your code were used to train Claude, you would get a buck or two in the mail every year. Authors got I think $3k per book as a one-time payment, but that was a funny case because of how Anthropic got access to the books.

              Still, there's a risk that Congress wouldn't act (due to standard US government dysfunction).

              It seems like most people are willing to take this risk, which I think says something interesting about most people's moral intuitions.

              bkuhn@fedi.copyleft.orgB This user is from outside of this forum
              bkuhn@fedi.copyleft.orgB This user is from outside of this forum
              bkuhn@fedi.copyleft.org
              wrote sidst redigeret af
              #236

              @novalis
              I agree with your supporting arguments but not the conclusion.

              It goes back to the mutually assured destruction idea: no one in the for profit proprietary software industry is going to bring a lawsuit because they are so invested in LLM-backed AI succeeding.

              That's where our commons differs widely from other creative works of expression.

              I am worried about compulsory licensing for *training* —could be a disaster — but unrelated to output.

              @zacchiro @cwebber @ossguy @richardfontana

              ? 2 Replies Last reply
              0
              • sfoskett@techfieldday.netS sfoskett@techfieldday.net

                @evan @cwebber @bkuhn @ossguy @richardfontana Another major concern is that works generated by AI are not copyrightable per the US Supreme Court. So code generated by an LLM can not be licensed at all, open or closed. https://www.reuters.com/legal/government/us-supreme-court-declines-hear-dispute-over-copyrights-ai-generated-material-2026-03-02/

                richardfontana@mastodon.socialR This user is from outside of this forum
                richardfontana@mastodon.socialR This user is from outside of this forum
                richardfontana@mastodon.social
                wrote sidst redigeret af
                #237

                @sfoskett neither scotus nor afaik any other US court has held this. I would argue that it seems to be the direction the US legal system is going in, but I recently heard a persuasive counterargument from a well regarded US FOSS lawyer @evan @cwebber @bkuhn @ossguy

                sfoskett@techfieldday.netS 1 Reply Last reply
                0
                • evan@cosocial.caE evan@cosocial.ca

                  @richardfontana @cwebber @bkuhn @ossguy Seriously, though, a lot of the work seems like it is tractable to LLM automation?

                  Like, the abstraction part seems like it's just summarizing components at the function, module, and program level. This is the command-line argument parser, this is the database abstraction layer, this is the logging module. LLMs are pretty good at this!

                  evan@cosocial.caE This user is from outside of this forum
                  evan@cosocial.caE This user is from outside of this forum
                  evan@cosocial.ca
                  wrote sidst redigeret af
                  #238

                  For filtration, it seems like merger or scènes à faire would also be kind of automatable, maybe with human oversight. Is there a way to make a mailing daemon without a logging module? Maybe, but it's so common that everyone does it that way. Could you have a Person class without a getter and setter for the name? Probably not?

                  @richardfontana @cwebber @bkuhn @ossguy

                  evan@cosocial.caE bkuhn@fedi.copyleft.orgB 2 Replies Last reply
                  0
                  • evan@cosocial.caE evan@cosocial.ca

                    For filtration, it seems like merger or scènes à faire would also be kind of automatable, maybe with human oversight. Is there a way to make a mailing daemon without a logging module? Maybe, but it's so common that everyone does it that way. Could you have a Person class without a getter and setter for the name? Probably not?

                    @richardfontana @cwebber @bkuhn @ossguy

                    evan@cosocial.caE This user is from outside of this forum
                    evan@cosocial.caE This user is from outside of this forum
                    evan@cosocial.ca
                    wrote sidst redigeret af
                    #239

                    The comparison seems tough, but I'd put an LLM to the task. "How similar are the database abstraction layers in activitypub-bot and Fedify?" Again, I'd probably want some human review, but for that code stuff LLMs are pretty good.

                    @richardfontana @cwebber @bkuhn @ossguy

                    evan@cosocial.caE 1 Reply Last reply
                    0
                    • richardfontana@mastodon.socialR richardfontana@mastodon.social

                      @sfoskett neither scotus nor afaik any other US court has held this. I would argue that it seems to be the direction the US legal system is going in, but I recently heard a persuasive counterargument from a well regarded US FOSS lawyer @evan @cwebber @bkuhn @ossguy

                      sfoskett@techfieldday.netS This user is from outside of this forum
                      sfoskett@techfieldday.netS This user is from outside of this forum
                      sfoskett@techfieldday.net
                      wrote sidst redigeret af
                      #240

                      @richardfontana @evan @cwebber @bkuhn @ossguy I feel like it’s 3 questions for the court:
                      1 Can a non-human actor produce a copyrightable work? Likely no.
                      2 Is the human prompt and review enough to apply copyright to LLM content? Maybe?
                      3 Does this have implications for open source? I guess not.

                      bkuhn@fedi.copyleft.orgB 1 Reply Last reply
                      0
                      • evan@cosocial.caE evan@cosocial.ca

                        The comparison seems tough, but I'd put an LLM to the task. "How similar are the database abstraction layers in activitypub-bot and Fedify?" Again, I'd probably want some human review, but for that code stuff LLMs are pretty good.

                        @richardfontana @cwebber @bkuhn @ossguy

                        evan@cosocial.caE This user is from outside of this forum
                        evan@cosocial.caE This user is from outside of this forum
                        evan@cosocial.ca
                        wrote sidst redigeret af
                        #241

                        I consider myself an expert on this process since I learned about it 45 minutes ago, but it seems like AFC follows the hierarchical layers of modern programming-in-the-large -- statements, functions, modules, packages, program. That is the stuff that LLMs handle pretty well.

                        @richardfontana @cwebber @bkuhn @ossguy

                        bkuhn@fedi.copyleft.orgB 1 Reply Last reply
                        0
                        • evan@cosocial.caE evan@cosocial.ca

                          For filtration, it seems like merger or scènes à faire would also be kind of automatable, maybe with human oversight. Is there a way to make a mailing daemon without a logging module? Maybe, but it's so common that everyone does it that way. Could you have a Person class without a getter and setter for the name? Probably not?

                          @richardfontana @cwebber @bkuhn @ossguy

                          bkuhn@fedi.copyleft.orgB This user is from outside of this forum
                          bkuhn@fedi.copyleft.orgB This user is from outside of this forum
                          bkuhn@fedi.copyleft.org
                          wrote sidst redigeret af
                          #242

                          @evan

                          I actually think that these copyright concepts aren't particularly automatable.
                          Even if we try, it's pure arms race.

                          And the merger doctrine isn't the big problem here, it is the more complex analysis where merger doctrine clearly doesn't apply that needs analysis and I suspect the analysis is difficult to (even partially) automate.

                          But I'm looking into it.

                          Cf: chardet situation https://github.com/chardet/chardet/issues/355#issuecomment-4145369025

                          @richardfontana @cwebber @ossguy

                          cwebber@social.coopC evan@cosocial.caE 2 Replies Last reply
                          0
                          • bkuhn@fedi.copyleft.orgB bkuhn@fedi.copyleft.org

                            @novalis
                            I agree with your supporting arguments but not the conclusion.

                            It goes back to the mutually assured destruction idea: no one in the for profit proprietary software industry is going to bring a lawsuit because they are so invested in LLM-backed AI succeeding.

                            That's where our commons differs widely from other creative works of expression.

                            I am worried about compulsory licensing for *training* —could be a disaster — but unrelated to output.

                            @zacchiro @cwebber @ossguy @richardfontana

                            ? Offline
                            ? Offline
                            Gæst
                            wrote sidst redigeret af
                            #243

                            @bkuhn @zacchiro @cwebber @ossguy @richardfontana I don't even know if I agree with my supporting arguments. But I don't even think that it has to be someone in the proprietary world that brings a lawsuit -- it could be anyone whose code or text is trained on.

                            1 Reply Last reply
                            0
                            • bkuhn@fedi.copyleft.orgB bkuhn@fedi.copyleft.org

                              @evan

                              I actually think that these copyright concepts aren't particularly automatable.
                              Even if we try, it's pure arms race.

                              And the merger doctrine isn't the big problem here, it is the more complex analysis where merger doctrine clearly doesn't apply that needs analysis and I suspect the analysis is difficult to (even partially) automate.

                              But I'm looking into it.

                              Cf: chardet situation https://github.com/chardet/chardet/issues/355#issuecomment-4145369025

                              @richardfontana @cwebber @ossguy

                              cwebber@social.coopC This user is from outside of this forum
                              cwebber@social.coopC This user is from outside of this forum
                              cwebber@social.coop
                              wrote sidst redigeret af
                              #244

                              @bkuhn @evan @richardfontana @ossguy One thing I worry about is that the chardet rewrite might not generalize. The chardet maintainer used *more* care in the rewrite than most projects which have followed suit for laundering would. https://dan-blanchard.github.io/blog/chardet-rewrite-controversy/

                              Even then, it raises questions, because even the maintainer admits, chardet was part of the training set.

                              It's very similar to how a friend recently sent me, "Claude managed to reverse engineer Bubble Bobble without using any reverse engineering tools, just inspecting the binary!" https://kotrotsos.medium.com/we-pointed-an-ai-at-raw-binary-files-from-1986-662ba30120f3

                              Which like, Claude is enough of a black box already but Bubble Bobble is also one of the most studied ROMs in history, so that's hard to evaluate whether it's true. You'd have to choose a less studied ROM as a test case, not Bubble Bobble, which the internet has discussed to death.

                              cwebber@social.coopC 1 Reply Last reply
                              0
                              • cwebber@social.coopC cwebber@social.coop

                                @bkuhn @evan @richardfontana @ossguy One thing I worry about is that the chardet rewrite might not generalize. The chardet maintainer used *more* care in the rewrite than most projects which have followed suit for laundering would. https://dan-blanchard.github.io/blog/chardet-rewrite-controversy/

                                Even then, it raises questions, because even the maintainer admits, chardet was part of the training set.

                                It's very similar to how a friend recently sent me, "Claude managed to reverse engineer Bubble Bobble without using any reverse engineering tools, just inspecting the binary!" https://kotrotsos.medium.com/we-pointed-an-ai-at-raw-binary-files-from-1986-662ba30120f3

                                Which like, Claude is enough of a black box already but Bubble Bobble is also one of the most studied ROMs in history, so that's hard to evaluate whether it's true. You'd have to choose a less studied ROM as a test case, not Bubble Bobble, which the internet has discussed to death.

                                cwebber@social.coopC This user is from outside of this forum
                                cwebber@social.coopC This user is from outside of this forum
                                cwebber@social.coop
                                wrote sidst redigeret af
                                #245

                                @bkuhn @evan @richardfontana @ossguy Probably a ton of people here think I am anti-AI-output, and that I would be upset to find out that the chardet rewrite were legal.

                                Actually, I'm not! I'd be fine with the ability to copyright launder software to some degree, as long as we could do the same for proprietary software (including in binary form).

                                I'm concerned about whether or not we have an *equitable* situation, though. And I'm *more concerned* that we need to advise people, who are incorporating code *today*.

                                bkuhn@fedi.copyleft.orgB 1 Reply Last reply
                                0
                                • bkuhn@fedi.copyleft.orgB bkuhn@fedi.copyleft.org

                                  @evan

                                  I actually think that these copyright concepts aren't particularly automatable.
                                  Even if we try, it's pure arms race.

                                  And the merger doctrine isn't the big problem here, it is the more complex analysis where merger doctrine clearly doesn't apply that needs analysis and I suspect the analysis is difficult to (even partially) automate.

                                  But I'm looking into it.

                                  Cf: chardet situation https://github.com/chardet/chardet/issues/355#issuecomment-4145369025

                                  @richardfontana @cwebber @ossguy

                                  evan@cosocial.caE This user is from outside of this forum
                                  evan@cosocial.caE This user is from outside of this forum
                                  evan@cosocial.ca
                                  wrote sidst redigeret af
                                  #246

                                  @bkuhn I just did an abstraction and filtration pass on a medium-sized application framework (~30K LOC), and as an expert on the code I think it did a good job:

                                  https://claude.ai/share/071ccb69-5d22-4673-905a-362d9663e7d0

                                  It missed a few things (e.g. relay specs). Then again, I have no idea how this kind of review is supposed to work. I didn't go down to the function or statement level -- that'd probably be much noisier.

                                  Maybe chardet 2 and 7 would be a better test of the technique?

                                  @richardfontana @cwebber @ossguy

                                  evan@cosocial.caE 2 Replies Last reply
                                  0
                                  • cwebber@social.coopC cwebber@social.coop

                                    @bkuhn @ossguy @richardfontana I say "good outcome", and I'm not saying it's an outcome I want, because "what I want" is pretty complicated here. I'm saying, it's the only one where there is the possibility of legal output from these tools that can safely be incorporated into FOSS projects *at all* that is *equitable* for both FOSS and proprietary situations.

                                    And yup, unfortunately, that would mean copyright-laundering of FOSS codebases through LLMs would be possible to strip copyleft.

                                    It would also mean the same for proprietary codebases.

                                    Frankly I think it would kind of rule if we stabbed copyright in the gut that badly, but there's so much vested interest from various copyright holding corporations, I don't think we're likely to get that. Do you?

                                    richardjacton@fosstodon.orgR This user is from outside of this forum
                                    richardjacton@fosstodon.orgR This user is from outside of this forum
                                    richardjacton@fosstodon.org
                                    wrote sidst redigeret af
                                    #247

                                    @cwebber @bkuhn @ossguy @richardfontana I'd don't see a great way out of the copyright stripping conclusions for them without changes to the law. As I understand their defense for training on copyrighted materials - it's predicated on the models being a "transformative" and not competing directly with the original works in the market. The models themselves don't compete with the training material only their outputs do - and the LLM companies want any liability for that to be on users not them.

                                    richardjacton@fosstodon.orgR 1 Reply Last reply
                                    0
                                    • richardjacton@fosstodon.orgR richardjacton@fosstodon.org

                                      @cwebber @bkuhn @ossguy @richardfontana I'd don't see a great way out of the copyright stripping conclusions for them without changes to the law. As I understand their defense for training on copyrighted materials - it's predicated on the models being a "transformative" and not competing directly with the original works in the market. The models themselves don't compete with the training material only their outputs do - and the LLM companies want any liability for that to be on users not them.

                                      richardjacton@fosstodon.orgR This user is from outside of this forum
                                      richardjacton@fosstodon.orgR This user is from outside of this forum
                                      richardjacton@fosstodon.org
                                      wrote sidst redigeret af
                                      #248

                                      @cwebber @bkuhn @ossguy @richardfontana Under this view it doesn't matter how the training data was licensed as it's a fair use defense. The outputs being uncopyrightable / effectively public domain allows people to claim they wrote it when it's convenient and they want to be able to copyright it as it's hard to prove if it was AI generated or human authored. And simultaneously to claim that it was the output of and LLM when they want to strip inconvenient licensing terms.

                                      bkuhn@fedi.copyleft.orgB 1 Reply Last reply
                                      0
                                      • evan@cosocial.caE evan@cosocial.ca

                                        @bkuhn I just did an abstraction and filtration pass on a medium-sized application framework (~30K LOC), and as an expert on the code I think it did a good job:

                                        https://claude.ai/share/071ccb69-5d22-4673-905a-362d9663e7d0

                                        It missed a few things (e.g. relay specs). Then again, I have no idea how this kind of review is supposed to work. I didn't go down to the function or statement level -- that'd probably be much noisier.

                                        Maybe chardet 2 and 7 would be a better test of the technique?

                                        @richardfontana @cwebber @ossguy

                                        evan@cosocial.caE This user is from outside of this forum
                                        evan@cosocial.caE This user is from outside of this forum
                                        evan@cosocial.ca
                                        wrote sidst redigeret af
                                        #249

                                        If I were going to productize this, I'd do AF passes on a huge training dataset like The Stack and generate some kind of fingerprint for each program. (Estimated cost: billions!)

                                        https://huggingface.co/datasets/bigcode/the-stack

                                        Then, I'd have a tool to let you fingerprint your own code and C it against the big database -- maybe give you a list of high-similarity codebases.

                                        And you could re-run the comparison each time you push to Git -- maybe only Cing what changed.

                                        @bkuhn @richardfontana @cwebber @ossguy

                                        1 Reply Last reply
                                        0
                                        • bkuhn@fedi.copyleft.orgB bkuhn@fedi.copyleft.org

                                          (2/5) … In https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ ,
                                          Denver's key points are: we *have* to (a) be open to *listening* to people who want to contribute #FOSS with #LLM-backed generative #AI systems, & (b) work collaboratively on a *plan* of how we can solve the current crisis.

                                          Nothing ever got done politically that was good when both sides become more entrenched, refuse to even concede the other side has some valid points, & each say the other is the Enemy. …

                                          Cc: @wwahammy @silverwizard @cwebber

                                          #OpenSource

                                          mu@mastodon.nzM This user is from outside of this forum
                                          mu@mastodon.nzM This user is from outside of this forum
                                          mu@mastodon.nz
                                          wrote sidst redigeret af
                                          #250

                                          @bkuhn @wwahammy @silverwizard @cwebber "Nothing ever got done politically that was good when both sides become more entrenched, refuse to even concede the other side has some valid points, & each say the other is the Enemy. … "

                                          Now that is a really strange thing to hear from someone who is representing a FOSS community, because that's basically what FOSS *is*

                                          bkuhn@fedi.copyleft.orgB 1 Reply Last reply
                                          0
                                          Svar
                                          • Svar som emne
                                          Login for at svare
                                          • Ældste til nyeste
                                          • Nyeste til ældste
                                          • Most Votes


                                          • Log ind

                                          • Har du ikke en konto? Tilmeld

                                          • Login or register to search.
                                          Powered by NodeBB Contributors
                                          Graciously hosted by data.coop
                                          • First post
                                            Last post
                                          0
                                          • Hjem
                                          • Seneste
                                          • Etiketter
                                          • Populære
                                          • Verden
                                          • Bruger
                                          • Grupper