👀 … https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993.
-
I agree with @ossguy in particular because if *we* are copylefting our code (even if assisted by #LLM-backed gen-#AI), we won't face a copyleft claim later.
Furthermore, it is highly unlikely these LLMs are (a) trained on proprietary software, and (b) any proprietary software company that so-trained would later claim infringement.
#Microsoft has all but admitted they refuse to train Copilot on their own code anyway.
@bkuhn @ossguy @LordCaramac @richardfontana
- There are plenty of FOSS projects we care about which are not under copyleft. What terms should they consider received code under? Should SDL now consider all LLM based output under the GPL? The AGPL? Which? Do you expect such a project to switch its license to copyleft now?
- Microsoft's proprietary code may not be, but plenty of proprietary code is available under extremely non-FOSS and restrictive licenses which are within datasets we are getting contributions from *today*
- The mutually assured destruction "safe option" isn't that things are under copyleft for proprietary companies though, that's still a losing scenario for them. So that doesn't help the case for copyleft, only accepting that LLM output under the public domain is (which we don't know) -
@bkuhn @ossguy @LordCaramac @richardfontana
- There are plenty of FOSS projects we care about which are not under copyleft. What terms should they consider received code under? Should SDL now consider all LLM based output under the GPL? The AGPL? Which? Do you expect such a project to switch its license to copyleft now?
- Microsoft's proprietary code may not be, but plenty of proprietary code is available under extremely non-FOSS and restrictive licenses which are within datasets we are getting contributions from *today*
- The mutually assured destruction "safe option" isn't that things are under copyleft for proprietary companies though, that's still a losing scenario for them. So that doesn't help the case for copyleft, only accepting that LLM output under the public domain is (which we don't know)@bkuhn @ossguy @LordCaramac @richardfontana It's somewhat of an aside, but my point regarding regarding Microsoft's codebase is not that Windows' code is in the inputs (this is true), my point was about a more interesting test for licence laundering is to launder a *leaked* proprietary codebase. If it's possible to copyright launder GPL'ed code, the equitable thing is that we should be able to copyright launder proprietary code. But again, that's somewhat of a tangent from the main points.
-
@cwebber I think maybe you missed https://sfconservancy.org/blog/2026/mar/04/scotus-deny-cert-dc-circuit-thaler-appeal-llm-ai/ where #SFC analyzed that situation?
Also, follow @ai_cases & see the *firehose* of litigation on this & remember the “Work Based on the Program” issue under GPLv2 has still never been litigated directly but lots of cases about 100% proprietary software have bolstered GPL's strength.Big Content has legal battles with Big Tech on 100s of fronts rn. Yes, we're adrift on their sea, but the situation is not as dire as you imagine.
-
@cwebber @bkuhn @ossguy @richardfontana how do you launder proprietary codebases if the source isn't available? i just see this as 2 negatives since it would incentivize trade secrets
@trwnh @bkuhn @ossguy @richardfontana Plenty of Microsoft code has been released under "shared source" licenses and also leaks
-
@bkuhn @ossguy @richardfontana So let me summarize:
- Without knowing the legal status of accepting LLM contributions, we're potentially polluting our codebases with stuff that we are going to have a HELL of a time cleaning up later
- The idea of a copyleft-only LLM is a joke and we should not rely on it
- We really only have two realistic scenarios: either FOSS projects cannot accept LLM based contributions legally from an international perspective, or everything is effectively in the public domain as outputted from these machines, but at least in the latter scenario we get to weaken copyright for everyone.That's leaving out a lot of other considerations about LLMs and the ethics of using them, which I think most of the other replies were focused on, I largely focused on the copyright implications aspects in this subthread. Because yes, I agree, it can be important to focus a conversation.
But we can't ignore this right now.
We're putting FOSS codebases at risk.
@cwebber @bkuhn @ossguy @richardfontana Worse IMHO is that we're putting FOSS as a movement at risk if we deskill everyone to the point where you either pay money to have code generated for you, or there is no code.
-
-
@cwebber @bkuhn @ossguy @richardfontana Worse IMHO is that we're putting FOSS as a movement at risk if we deskill everyone to the point where you either pay money to have code generated for you, or there is no code.
@jens @bkuhn @ossguy @richardfontana This is indeed a serious risk, though tangential to this subthread. But it's a concern I also have.
-
@trwnh @bkuhn @ossguy @richardfontana Plenty of Microsoft code has been released under "shared source" licenses and also leaks
@cwebber @bkuhn @ossguy @richardfontana sure, but my point is this would happen less often
-
@jens @bkuhn @ossguy @richardfontana This is indeed a serious risk, though tangential to this subthread. But it's a concern I also have.
@cwebber @bkuhn @ossguy @richardfontana Fully tangential, agreed.
-
@bkuhn @ossguy @richardfontana So let me summarize:
- Without knowing the legal status of accepting LLM contributions, we're potentially polluting our codebases with stuff that we are going to have a HELL of a time cleaning up later
- The idea of a copyleft-only LLM is a joke and we should not rely on it
- We really only have two realistic scenarios: either FOSS projects cannot accept LLM based contributions legally from an international perspective, or everything is effectively in the public domain as outputted from these machines, but at least in the latter scenario we get to weaken copyright for everyone.That's leaving out a lot of other considerations about LLMs and the ethics of using them, which I think most of the other replies were focused on, I largely focused on the copyright implications aspects in this subthread. Because yes, I agree, it can be important to focus a conversation.
But we can't ignore this right now.
We're putting FOSS codebases at risk.
-
@richardfontana @bkuhn @ossguy Glad to hear we agree there!
-
@bkuhn @ossguy @richardfontana So let me summarize:
- Without knowing the legal status of accepting LLM contributions, we're potentially polluting our codebases with stuff that we are going to have a HELL of a time cleaning up later
- The idea of a copyleft-only LLM is a joke and we should not rely on it
- We really only have two realistic scenarios: either FOSS projects cannot accept LLM based contributions legally from an international perspective, or everything is effectively in the public domain as outputted from these machines, but at least in the latter scenario we get to weaken copyright for everyone.That's leaving out a lot of other considerations about LLMs and the ethics of using them, which I think most of the other replies were focused on, I largely focused on the copyright implications aspects in this subthread. Because yes, I agree, it can be important to focus a conversation.
But we can't ignore this right now.
We're putting FOSS codebases at risk.
@cwebber @bkuhn @ossguy @richardfontana
Based on my following of current legal cases, I think it's entirely possible that in a year or two we'll suddenly be rolling large OSS codebases back to 2023. And won't that be fun!
-
However, it's not actually the laundering angle I am concerned with here entirely, it's whether we're turning FOSS codebases into potential legal toxic waste dumps that we will have a hell of a time cleaning up later.
The previous Conservancy post, which @bkuhn linked upthread, indicates that Conservancy does indeed consider the matter unsettled.
Current LLMs wouldn't "default to copyleft", since they also include all-rights-reserved mixed in there. If the result of output of these systems is a slurry of inputs which carry their licensing somehow, their default licensing output situation is one of a hazard.
I note that @bkuhn and @ossguy seem to be hinting at hoping a "copyleft based LLM" with all-copyleft output it a winning scenario. I'm going to state plainly: I believe that's an impossible outcome.
Are you concerned that the LLMs generate nontrivial verbatim excerpts of copyrighted works?
Or that there is a hidden "intellectual property" in the deep patterns that they use?
Say, when an LLM was trained on a file I made with an interesting loop structure, and it emits code with a similar loop structure, even if the variable names, problem domain, details, or programming language differ.
What if a court says I can demand royalties for my "IP"?
-
@richardfontana @bkuhn @ossguy Glad to hear we agree there!
-
Are you concerned that the LLMs generate nontrivial verbatim excerpts of copyrighted works?
Or that there is a hidden "intellectual property" in the deep patterns that they use?
Say, when an LLM was trained on a file I made with an interesting loop structure, and it emits code with a similar loop structure, even if the variable names, problem domain, details, or programming language differ.
What if a court says I can demand royalties for my "IP"?
@cwebber @bkuhn @ossguy @richardfontana
Like, not copyrightable, not patents, but some secret third thing, kind of what people mean when we say that someone "copied our idea".
-
Are you concerned that the LLMs generate nontrivial verbatim excerpts of copyrighted works?
Or that there is a hidden "intellectual property" in the deep patterns that they use?
Say, when an LLM was trained on a file I made with an interesting loop structure, and it emits code with a similar loop structure, even if the variable names, problem domain, details, or programming language differ.
What if a court says I can demand royalties for my "IP"?
@evan @richardfontana I am saying we don't know the answer to that question, and it seems that @bkuhn and @ossguy agree that we don't know the answer to it, based on previous posts, and the lack of knowledge about what the copyright implications of LLM based contributions means that we are creating a schrodingers-licensing-timebomb for our FOSS codebases
-
@cwebber @bkuhn @ossguy @richardfontana
Like, not copyrightable, not patents, but some secret third thing, kind of what people mean when we say that someone "copied our idea".
@evan @bkuhn @ossguy @richardfontana I am talking about copyright
-
@evan @bkuhn @ossguy @richardfontana I am talking about copyright
@cwebber excellent, thanks!
-
@evan @bkuhn @ossguy @richardfontana I am talking about copyright
@evan @bkuhn @ossguy @richardfontana Say for a moment that we *did* make a model which intentionally pulled in leaked source code from various proprietary codebases.
What would your opinion be on the legal-hazard state of accepting that code output? Would you consider it relatively safe from a copyright perspective?
-
@bkuhn @ossguy @richardfontana Except, I actually believe this scenario isn't legally viable. And it's easier to understand if we scale back to the middle case.
Let's now look at the LLM trained on CC0 and CC BY. Because it's the BY aspect that makes everything complicated.
There is *NO WAY* in current LLM technology, nor I believe from studying how neural networks work, any viable computationally performant LLM, that they can track provenance. The BY clause cannot be upheld.
This isn't a theoretical concern for me; someone built another vibecoded Scheme-to-WASM-GC compiler that looks an awful lot like Spritely's own Hoot compiler in places. They didn't attribute us. They probably didn't know. But like many FOSS licenses, Apache v2 does require certain levels of attribution to be upheld. Most FOSS projects do.
You can't uphold the CC BY requirement, as far as I can tell.