👀 … https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993.
-
@bkuhn @kees @glitzersachen @josh @silverwizard @ossguy @xgranade
This is not a remotely accurate analogy. The level of rage in this country over AI is uncontrollable and it's accelerating. Two people tried to kill Sam Altman in the last week. An Indiana planning official's house was shot after they approved a new data center.
In the political realm, the shift is unimaginably swift. Ex: 6 months ago, no Democrat for WI governor had a policy on data centers because building unions wanted them. Now every one of them is fighting over how strict their ban on data centers is.
The best analogy I think of is the opioid crisis. When people were ready to kill the Sacklers and everyone at Purdue Pharma, you can't come in and say anything that people think you are tolerant of the damage. You can't even argue "we can punish these people but we have to protect access to opioids". Everyone KNOWS there are uses but you can't build a policy around that because the public doesn't care. At all.
The only time you can have this discussion was years ago or years in the future after the public has taken their pound of flesh. Right now, it's an immensely dangerous idea for SFC.
@wwahammy @bkuhn @kees @glitzersachen @josh @silverwizard @ossguy @xgranade the people wanting to kill Sam Altman are doing so because they are afraid of the AI Doomer stories, this discussion about including slop in software is very different.
-
@firefly_lightning
You're not overstepping, and these are very good perspectives. I hope you'll come to the real-time discussion sessions and talk about this.
I am concerned that maintainers are already overwhelmed with #AI #slop right now but yelling at the problem has not helped.We're close to an arms race here & I'd rather be the voice of reason to find a compromise that advances FOSS & doesn't complicate maintainer's jobs rather than take a side in the arms race.
Cc: @josh @kees @ossguy@bkuhn @firefly_lightning @josh @kees @ossguy ok Neville Chamberlain
-
@bkuhn @ossguy @richardfontana So let me summarize:
- Without knowing the legal status of accepting LLM contributions, we're potentially polluting our codebases with stuff that we are going to have a HELL of a time cleaning up later
- The idea of a copyleft-only LLM is a joke and we should not rely on it
- We really only have two realistic scenarios: either FOSS projects cannot accept LLM based contributions legally from an international perspective, or everything is effectively in the public domain as outputted from these machines, but at least in the latter scenario we get to weaken copyright for everyone.That's leaving out a lot of other considerations about LLMs and the ethics of using them, which I think most of the other replies were focused on, I largely focused on the copyright implications aspects in this subthread. Because yes, I agree, it can be important to focus a conversation.
But we can't ignore this right now.
We're putting FOSS codebases at risk.
@cwebber
I agree about the hazard. LLM outputs should be considered derivative of all their inputs unless established otherwise. LLMs manipulate expression, not ideas, and the propensity of verbatim reproduction (up to and including entire books) is evidence of that process. Note that the purpose of the "substantial similarity" test is as circumstantial evidence of process.I think the counterpoints are "mutually-assured destruction" and/or "yolo denial-of-service attack on copyright will win because power likes it". "AI" companies are still delaying cases from 2022 (like Doe v GitHub) because they want a jury who believes it is inevitable. Plaintiffs seek to win their cases, not to establish broad precedent. OpenAI has already lost (in German court) on copyright infringement of their outputs, arguing unsuccessfully that the infringement is the sole responsibility of their customers for prompting. The political reality of public sentiment is changing and collapse of the financial bubble will greatly alter the power held by "AI" companies.
Meanwhile, I think the words of the DCO ought to mean something, even for those who are certain they are a smol bean.
-
@davidgerard @wwahammy @silverwizard @firefly_lightning @cwebber Yes, which is why it's important to allow people to identify when they have used LLM/AI assistants to help. New contributors will see this is the norm, and then it will be easier to help them, because we'll know a bit about where any potential knowledge gaps might be coming from.
If we "ban" LLM/AI-assisted contributions, people will use them anyway but hide their use, which is a trickier problem to solve.
-
@cwebber
I agree about the hazard. LLM outputs should be considered derivative of all their inputs unless established otherwise. LLMs manipulate expression, not ideas, and the propensity of verbatim reproduction (up to and including entire books) is evidence of that process. Note that the purpose of the "substantial similarity" test is as circumstantial evidence of process.I think the counterpoints are "mutually-assured destruction" and/or "yolo denial-of-service attack on copyright will win because power likes it". "AI" companies are still delaying cases from 2022 (like Doe v GitHub) because they want a jury who believes it is inevitable. Plaintiffs seek to win their cases, not to establish broad precedent. OpenAI has already lost (in German court) on copyright infringement of their outputs, arguing unsuccessfully that the infringement is the sole responsibility of their customers for prompting. The political reality of public sentiment is changing and collapse of the financial bubble will greatly alter the power held by "AI" companies.
Meanwhile, I think the words of the DCO ought to mean something, even for those who are certain they are a smol bean.
A case from 2022 still not a trial in 2026 doesn't indicate unreasonable or manipulative delay by Defendants. Such cases really do take that long.
Also, Doe vs. Microsoft's Github is a terribly constructed case and actually pushes us toward compulsory licensing of #FOSS works for #LLM-backed gen-#AI training— since the Plaintiff's lawyers in that case are clearly chasing their own avarice, not software freedom.
Background:
https://sfconservancy.org/news/2022/nov/04/class-action-lawsuit-filing-copilot/
@cwebber @ossguy @richardfontana -
@bkuhn @ossguy @richardfontana So let me summarize:
- Without knowing the legal status of accepting LLM contributions, we're potentially polluting our codebases with stuff that we are going to have a HELL of a time cleaning up later
- The idea of a copyleft-only LLM is a joke and we should not rely on it
- We really only have two realistic scenarios: either FOSS projects cannot accept LLM based contributions legally from an international perspective, or everything is effectively in the public domain as outputted from these machines, but at least in the latter scenario we get to weaken copyright for everyone.That's leaving out a lot of other considerations about LLMs and the ethics of using them, which I think most of the other replies were focused on, I largely focused on the copyright implications aspects in this subthread. Because yes, I agree, it can be important to focus a conversation.
But we can't ignore this right now.
We're putting FOSS codebases at risk.
Re: “polluting”, my reply is: https://fedi.copyleft.org/@bkuhn/116426437134023846 (elsewhere in thread).
Re: “copyleft-only #LLM”: I didn't propose that. I proposed copylefting the human-modified output of LLMs.
Re: “two scenarios”: IMO you propose a false dichotomy.
I hope you come to one of #SFC's public sessions on this, as I'd be glad to talk more about it, & this discussion doesn't lend itself to online debate because it's so complex.
-
@bkuhn @firefly_lightning @josh @kees @ossguy ok Neville Chamberlain
A WWII reference is never helpful in a discussion unless the topic *is actually* WWII.
I'd be glad to have a serious discussion with you, but if you follow Godwin's law again, I probably will block you.
I know emotions are frayed and the FOSS community is frightened and worried, so I forgive you. But there is no reason to claim the situation with LLM-backed AI is tantamount to Hitler's violent invasion of Europe.
-
I might ask ChatGPT to give it a try, and give it some extra incentive to dig deeper because if it digs up some dirt on Claude it'd be good for business.
@evan
… but I know you're only half joking.Frankly part of the problem here is that people are either taking this situation *too* seriously or not serious enough. I'm guessing you're right in the happy medium, but your comment made me think of that point.
-
I think you could make the case that Claude is not an uninterested party in this discussion, since Blanchard used Claude to generate the code, so maybe it's lying to cover up its tracks.
@evan
I have a speculative suspicion that the “leak” of Claude's front-end code was a false flag operation *hoping* someone would so-called “clean-room-with-Claude” their own UI.
I have this theory b/c the UI code is not what Claude needs to IPO (it's all the server side stuff that matters), and it behooves them & their investors if they themselves take a “fair's fair” position on the leak of their own code.
I'm meanwhile working on the chardet situation. -
I gave it a try. It's quite wordy! Claude thought that a lot of Pilgrim's work would be filtered since it was a direct port from the Mozilla C++ codebase. I pushed back that they shared the same license, and it loosened up that constraint.
https://claude.ai/share/e4aae73c-14d1-462e-9773-4381adde54f7
Warning: if you read this document, it will get AI in you, and it will make you AI and you will become an AI-booster like me and Sam Altman. It will also burn down the rainforest.
I don't mind that you tried (and I even clicked on the link so I guess I burnt down a rainforest?), but this reads like LLM-backed gen-AI slop to me. Full of truthiness but seems to lack depth of understanding of the AFC test.
I hope you can make it to one of SFC's chats on this topic.
-
@bkuhn @wwahammy @silverwizard @cwebber "Nothing ever got done politically that was good when both sides become more entrenched, refuse to even concede the other side has some valid points, & each say the other is the Enemy. … "
Now that is a really strange thing to hear from someone who is representing a FOSS community, because that's basically what FOSS *is*
I saw this comment after I saw you elsewhere in the thread comparing the LLM-backed genAI situation to WWII, so I am have a lot of trouble taking this seriously.
Plus your comment is snarky, sarcastic, mean, and slightly ad hominem. There is no reason for all that in civil debate.
-
@cwebber @bkuhn @ossguy @richardfontana Under this view it doesn't matter how the training data was licensed as it's a fair use defense. The outputs being uncopyrightable / effectively public domain allows people to claim they wrote it when it's convenient and they want to be able to copyright it as it's hard to prove if it was AI generated or human authored. And simultaneously to claim that it was the output of and LLM when they want to strip inconvenient licensing terms.
@RichardJActon
The copyleft-ish hack I propose is *we* (FOSS community) assume that any output of an LLM-backed genAI system *is* copylefted (since we are pretty sure all such systems — at least those designed for software development assist — have been trained on copylefted codebases).
Then, we copyleft any work that comes out of the system.
The only threat is proprietary software in the training set, & the industry can't abide enforcing *that*!
@cwebber @ossguy @richardfontana
@evan
@kees -
@bkuhn @evan @richardfontana @ossguy Probably a ton of people here think I am anti-AI-output, and that I would be upset to find out that the chardet rewrite were legal.
Actually, I'm not! I'd be fine with the ability to copyright launder software to some degree, as long as we could do the same for proprietary software (including in binary form).
I'm concerned about whether or not we have an *equitable* situation, though. And I'm *more concerned* that we need to advise people, who are incorporating code *today*.
We already know the situation isn't equitable & probably won't become such in our lifetimes. Microsoft already all-but-admitted they will never train Copilot on their code. No proprietary software company is going to offer training data back to other vendors.
The goal here obviously was to LLM-wash away copyleft. *That* we must resist, and use their own tools against them: which is the very spirit that made copyleft in the first place!
-
I consider myself an expert on this process since I learned about it 45 minutes ago, but it seems like AFC follows the hierarchical layers of modern programming-in-the-large -- statements, functions, modules, packages, program. That is the stuff that LLMs handle pretty well.
@evan wrote:
> “I consider myself an expert on this process since I learned about it 45 minutes ago ”
This is the second time you've made me
in this thread. Thanks for being comic relief (and I know that's not *all* you're doing, but that part is particularly helpful). Thank you! -
@richardfontana @evan @cwebber @bkuhn @ossguy I feel like it’s 3 questions for the court:
1 Can a non-human actor produce a copyrightable work? Likely no.
2 Is the human prompt and review enough to apply copyright to LLM content? Maybe?
3 Does this have implications for open source? I guess not.*Thaler is limited to DC Circuit & very narrow. It's a registration question, & even *its* dicta hints there is no way we can know the answer on (1).
I think (2) is a strong argument.
As for (3), there is huge value to be extracted by applying copyleft-ish principles (and copyleft licenses themselves) to LLM-backed genAI output.
In worse case: a big complex mix of public domain + copylefted-human-authored stuff can't easily be separated.
-
@ossguy @cwebber @LordCaramac @bkuhn @richardfontana proprietary software companies extensively use GitHub and yet SFC's position is "don't use GitHub".
There are so many things we do in free software and in the interactions with SFC and FSF that would be simpler if we used proprietary software. How many janky experiences have people been asking to tolerate to participate? Why shouldn't we use proprietary software there?
Indeed, SFC's position is #GiveUpGithub, but N.B. the https://giveupgithub.com/ site itself admits most people will uses it & suggests a “using Github under protest” README.md.
I use proprietary software every day. I've been convinced for ≥ 10yrs: one can't succeed in an industrialized nation at *anything* w/out sometimes doing so.
The difficulty is figuring out when to compromise. I remain open-minded.
Few of us will be FOSS monks. -
@richardfontana wrote:
> “oh I mean of course you could use LLMs to help with the analysis ”I'm catching up backwards on this thread, but do you see now the monster you created by telling @evan that?
-
@richardfontana @cwebber @bkuhn @ossguy Yeah, I thought my job couldn't be automated, either, and yet here we are.
-
@evan @cwebber @bkuhn @ossguy @richardfontana Another major concern is that works generated by AI are not copyrightable per the US Supreme Court. So code generated by an LLM can not be licensed at all, open or closed. https://www.reuters.com/legal/government/us-supreme-court-declines-hear-dispute-over-copyrights-ai-generated-material-2026-03-02/
@sfoskett
I responded in detail in another post to your conclusions later, but the assumption is wrong too. It's just pure FUD to say: “works generated by AI are not copyrightable per the US Supreme Court”.
https://sfconservancy.org/blog/2026/mar/04/scotus-deny-cert-dc-circuit-thaler-appeal-llm-ai/
TL;DR: *DC Circuit* held that a specific copyright registration *for a digital painting* that lists a computer program as the sole author is not eligible *at this time* for copyright *registration*. SCOTUS decided to not hear the case. -