👀 … https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993.
-
@cwebber @bkuhn @ossguy @richardfontana
My current answer to your "is it safe" question is to answer a slightly different question. Namely: "is it any less safe than accepting code from a random employee that claims to be submitting under a inbound=outbound regime, whereas in fact they cannot?". The latter we have been doing for decades, with limited damages to the commons.
(I *also* think the legal odds are more in our favor with AI-assisted contributions than in the previous case.)
@zacchiro @bkuhn @ossguy @richardfontana While true, there is a big difference in that the previous scenario was someone out of compliance with what the community actually accepted as hygienic and acceptable contributions, and those contributions were relatively rare.
Saying that we don't need to worry about the risks from these tools right now from a licensing situation is different: it's advising on a path being acceptable where we *don't know* whether or not it's generally safe practice to recommend! And which most in this thread seem to agree we don't know. Even your post seems to say "it seems like it'll probably be okay and end up in our favor".
I guess I feel increasingly like I am maybe the only "oldschool FOSS licensing wonk" who cares about this, and maybe that means I should just give up.
But *damn* I can't believe it feels like when people are both saying "we don't know what the implications will be" we're also saying "so go ahead and say those patches are a-ok!"
-
@richardfontana As said here, given the "translation between languages" aspect, I can't really see that as likely to be true https://social.coop/@cwebber/116426770262334234
Which maybe that means that all this stuff really is public domain, a position I am *fully willing to accept*! But I don't think it's known (especially internationally), and I don't think @bkuhn or @ossguy are eager to adopt that perspective
-
@evan @richardfontana I am saying we don't know the answer to that question, and it seems that @bkuhn and @ossguy agree that we don't know the answer to it, based on previous posts, and the lack of knowledge about what the copyright implications of LLM based contributions means that we are creating a schrodingers-licensing-timebomb for our FOSS codebases
This is probably a healthy concern.
I think there might be some good ways to hedge one's bets, though.
Use LLMs for rubber ducking, code scanning and review, rather than code generation.
Keep LLM code contributions minimal and unremarkable, too.
Don't make them load-bearing. If the code is central to the program, it's too unique.
-
This is probably a healthy concern.
I think there might be some good ways to hedge one's bets, though.
Use LLMs for rubber ducking, code scanning and review, rather than code generation.
Keep LLM code contributions minimal and unremarkable, too.
Don't make them load-bearing. If the code is central to the program, it's too unique.
@evan @richardfontana @bkuhn @ossguy Yeah! I actually already said elsewhere in the thread I don't think we need to worry about using these tools for such scenarios from a *licensing* perspective, only when the genAI is explicitly checked into the codebase
-
This is probably a healthy concern.
I think there might be some good ways to hedge one's bets, though.
Use LLMs for rubber ducking, code scanning and review, rather than code generation.
Keep LLM code contributions minimal and unremarkable, too.
Don't make them load-bearing. If the code is central to the program, it's too unique.
@evan @cwebber @richardfontana @bkuhn @ossguy this is wisdom
-
@cwebber @bkuhn @ossguy @richardfontana
My current answer to your "is it safe" question is to answer a slightly different question. Namely: "is it any less safe than accepting code from a random employee that claims to be submitting under a inbound=outbound regime, whereas in fact they cannot?". The latter we have been doing for decades, with limited damages to the commons.
(I *also* think the legal odds are more in our favor with AI-assisted contributions than in the previous case.)
@zacchiro @cwebber @bkuhn @ossguy @richardfontana I would say it's dramatically less safe. First, there's very little incentive to go after some OSS project over an unauthorized inbound=outbound contribution. Second, if someone did, the damage would likely be a small part of a single project. Third, only a small number of parties (the employer, or maybe some other single party whose code was copied) have the ability to sue.
With LLMs, it's different. When the authors sued Anthropic, they all sued. Is a shell script that Claude generated a derivative work of, say, the romantasy novel A Court of Thorns and Roses (to pick a random thing included in Anthropic's training set)? Well, it's hard to show that it's not, in the sense that that novel is one of the zillion things that went into generating the weights that generated the shell script.
Now it happens that the authors sued Anthropic (and settled). But I don't know if their settlement covers users of Claude (and even if it did, there are two other big models). And that's only the book authors -- there's still all of the code authors in the world.
So yes, I think the risk is high. I mean, in some sense -- in another sense, it seems unlikely that Congress would say, "sorry, LLMs as code generators are toast because of some century-old laws". At most, they would set up a statutory licensing scheme for LLM providers which covers LLM outputs. Of course, Europe might go a different way, but I think they would probably do the same. Under this hypothetical scheme, if your code were used to train Claude, you would get a buck or two in the mail every year. Authors got I think $3k per book as a one-time payment, but that was a funny case because of how Anthropic got access to the books.
Still, there's a risk that Congress wouldn't act (due to standard US government dysfunction).
It seems like most people are willing to take this risk, which I think says something interesting about most people's moral intuitions.
-
This is probably a healthy concern.
I think there might be some good ways to hedge one's bets, though.
Use LLMs for rubber ducking, code scanning and review, rather than code generation.
Keep LLM code contributions minimal and unremarkable, too.
Don't make them load-bearing. If the code is central to the program, it's too unique.
@evan
@cwebber @richardfontana @bkuhn @ossguy or just... not at all -
This is probably a healthy concern.
I think there might be some good ways to hedge one's bets, though.
Use LLMs for rubber ducking, code scanning and review, rather than code generation.
Keep LLM code contributions minimal and unremarkable, too.
Don't make them load-bearing. If the code is central to the program, it's too unique.
I think the worst case scenario is that the inserted code matches exactly one snippet in the training data.
So you could try to go for zero matches, by using such idiosyncratic and unrecommended coding conventions that nobody else has code like yours.
Or you could try to go for lots of matches, by using bog standard coding conventions and software patterns.
-
@evan @richardfontana @bkuhn @ossguy Yeah! I actually already said elsewhere in the thread I don't think we need to worry about using these tools for such scenarios from a *licensing* perspective, only when the genAI is explicitly checked into the codebase
@cwebber the weights themselves?
-
@cwebber the weights themselves?
@evan @richardfontana @bkuhn @ossguy Sorry, I missed a word when I edited the sentence, I meant "genAI output"
-
I think the worst case scenario is that the inserted code matches exactly one snippet in the training data.
So you could try to go for zero matches, by using such idiosyncratic and unrecommended coding conventions that nobody else has code like yours.
Or you could try to go for lots of matches, by using bog standard coding conventions and software patterns.
But maybe that's wrong; I don't know. Maybe if I wrote a Person.setName() method that was in the training set, and the LLM generated an identical Person.setName() code snippet for someone else, I could claim that the code is a copyright violation, even if there were thousands of other identical and independent Person.setName() methods in the training set.
-
@evan @richardfontana @bkuhn @ossguy Sorry, I missed a word when I edited the sentence, I meant "genAI output"
@cwebber it's sometimes a distinction that people blur!
-
Are you concerned that the LLMs generate nontrivial verbatim excerpts of copyrighted works?
Or that there is a hidden "intellectual property" in the deep patterns that they use?
Say, when an LLM was trained on a file I made with an interesting loop structure, and it emits code with a similar loop structure, even if the variable names, problem domain, details, or programming language differ.
What if a court says I can demand royalties for my "IP"?
@evan @cwebber @bkuhn @ossguy @richardfontana Another major concern is that works generated by AI are not copyrightable per the US Supreme Court. So code generated by an LLM can not be licensed at all, open or closed. https://www.reuters.com/legal/government/us-supreme-court-declines-hear-dispute-over-copyrights-ai-generated-material-2026-03-02/
-
But maybe that's wrong; I don't know. Maybe if I wrote a Person.setName() method that was in the training set, and the LLM generated an identical Person.setName() code snippet for someone else, I could claim that the code is a copyright violation, even if there were thousands of other identical and independent Person.setName() methods in the training set.
@evan That’s not enough code for copyright enforcement. People have been finding identical code in the output - you just need something “rare”. It’s similar for subjects with little text in the corpus - I’ve been seeing listings that *can only have one source* (retro datasheets by AMD, in my case).
-
@richardfontana @bkuhn @ossguy That's a problem so hard it throws the "NP complete" debate out the window in favor of something brand new. Given that these codebases have no trouble "translating" from one language's source code into another, how on *earth* could you possibly hope to build a compliance tool around that?
Laughable, to anyone who tries.
This is a really interesting question! TIL about CA vs. Altai and the abstraction-filtration-comparison test.
I'm not sure how automatable it is. Interesting to try though!
-
@evan @cwebber @bkuhn @ossguy @richardfontana Another major concern is that works generated by AI are not copyrightable per the US Supreme Court. So code generated by an LLM can not be licensed at all, open or closed. https://www.reuters.com/legal/government/us-supreme-court-declines-hear-dispute-over-copyrights-ai-generated-material-2026-03-02/
@sfoskett @evan @bkuhn @ossguy @richardfontana That outcome I am not worried about; code that's not copyrightable is considered in the public domain within the US, which means there aren't any real risks to incorporating into FOSS projects. But the Supreme Court punted on it, they didn't rule that way.
-
@evan @cwebber @bkuhn @ossguy @richardfontana Another major concern is that works generated by AI are not copyrightable per the US Supreme Court. So code generated by an LLM can not be licensed at all, open or closed. https://www.reuters.com/legal/government/us-supreme-court-declines-hear-dispute-over-copyrights-ai-generated-material-2026-03-02/
@sfoskett you can incorporate public domain code into a licensed work.
-
This is a really interesting question! TIL about CA vs. Altai and the abstraction-filtration-comparison test.
I'm not sure how automatable it is. Interesting to try though!
-
… https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993. I was on USENET extensively then; I confirm the disruption was indeed similar. I urge you to read his essay, think about it, & join Denver, me, & others at the following datetimes…
$ date -d '2026-04-21 15:00 UTC'
$ date -d '2026-04-28 23:00 UTC'
…in https://bbb-new.sfconservancy.org/rooms/welcome-llm-gen-ai-users-to-foss/join
#AI #LLM #OpenSourceSorry for interfering in the discussion out of the blue, but the topic is really interesting. I really hope that the conclusion of this will not be engineers saying they are not lawyers, and lawyers saying that it's for the courts to decide.
-
Sorry for interfering in the discussion out of the blue, but the topic is really interesting. I really hope that the conclusion of this will not be engineers saying they are not lawyers, and lawyers saying that it's for the courts to decide.
To chip in, a situation where AI-generated code is completely unacceptable is hard, if not impossible, to implement. This also puts a lot of pressure on reviewers, who have the difficult job of determining whether a piece of code is AI-generated. Sometimes it's easy; sometimes it's impossible under the conditions they operate in. If the code is good, it should be accepted.