👀 … https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ …my colleague Denver Gingerich writes: newcomers' extensive reliance on LLM-backed generative AI is comparable to the Eternal September onslaught to USENET in 1993.

evan@cosocial.ca

I think the worst case scenario is that the inserted code matches exactly one snippet in the training data.

So you could try to go for zero matches, by using such idiosyncratic and unrecommended coding conventions that nobody else has code like yours.

Or you could try to go for lots of matches, by using bog standard coding conventions and software patterns.

@cwebber @richardfontana @bkuhn @ossguy

evan@cosocial.ca

@cwebber the weights themselves?

@richardfontana @bkuhn @ossguy

cwebber@social.coop

@evan @richardfontana @bkuhn @ossguy Sorry, I missed a word when I edited the sentence, I meant "genAI output"

evan@cosocial.ca

But maybe that's wrong; I don't know. Maybe if I wrote a Person.setName() method that was in the training set, and the LLM generated an identical Person.setName() code snippet for someone else, I could claim that the code is a copyright violation, even if there were thousands of other identical and independent Person.setName() methods in the training set.

@cwebber @richardfontana @bkuhn @ossguy

evan@cosocial.ca

@cwebber it's sometimes a distinction that people blur!

@richardfontana @bkuhn @ossguy

sfoskett@techfieldday.net

@evan @cwebber @bkuhn @ossguy @richardfontana Another major concern is that works generated by AI are not copyrightable per the US Supreme Court. So code generated by an LLM can not be licensed at all, open or closed. https://www.reuters.com/legal/government/us-supreme-court-declines-hear-dispute-over-copyrights-ai-generated-material-2026-03-02/

promovicz@chaos.social

@evan That’s not enough code for copyright enforcement. People have been finding identical code in the output - you just need something “rare”. It’s similar for subjects with little text in the corpus - I’ve been seeing listings that *can only have one source* (retro datasheets by AMD, in my case).

evan@cosocial.ca

@cwebber

This is a really interesting question! TIL about CA vs. Altai and the abstraction-filtration-comparison test.

https://w.wiki/LbiW

I'm not sure how automatable it is. Interesting to try though!

@richardfontana @bkuhn @ossguy

cwebber@social.coop

@sfoskett @evan @bkuhn @ossguy @richardfontana That outcome I am not worried about; code that's not copyrightable is considered in the public domain within the US, which means there aren't any real risks to incorporating into FOSS projects. But the Supreme Court punted on it, they didn't rule that way.

evan@cosocial.ca

@sfoskett you can incorporate public domain code into a licensed work.

@cwebber @bkuhn @ossguy @richardfontana

richardfontana@mastodon.social

@evan I feel pretty confident in saying the abstraction-filtration-comparison test cannot possibly be automated @cwebber @bkuhn @ossguy

redchrision@mastodon.social

Sorry for interfering in the discussion out of the blue, but the topic is really interesting. I really hope that the conclusion of this will not be engineers saying they are not lawyers, and lawyers saying that it's for the courts to decide.

redchrision@mastodon.social

To chip in, a situation where AI-generated code is completely unacceptable is hard, if not impossible, to implement. This also puts a lot of pressure on reviewers, who have the difficult job of determining whether a piece of code is AI-generated. Sometimes it's easy; sometimes it's impossible under the conditions they operate in. If the code is good, it should be accepted.

redchrision@mastodon.social

The practical issue is the fact that reviewers risk facing large amounts of PRs done completely with LLMs, and even by LLMs, under the name of a human who uses them. That generates enormous risk for code quality. Probably a practical way of handling this is to accept PRs only from community members who are validated with a status of “valid contributor”, making it more official, basically.

redchrision@mastodon.social

This would mean putting responsibility on those who are contributing: if they make PRs, those should be carefully analyzed by them beforehand, even if they use AI, since it's hard to control that.

redchrision@mastodon.social

This will avoid a situation where a reviewer ends up reviewing what an LLM has written and then becomes, in some sense, an author without their consent of whatever the LLM outputs next, through comments on the PR and suggestions for improvement that will turn into future prompts.

redchrision@mastodon.social

The sanction for not respecting that should be related to reputation within that community and decided locally: whether they further allow that person to contribute, completely ban contributions, close PRs from the beginning, etc.

redchrision@mastodon.social

On the topic of proprietary code generated by LLMs and then accepted in OSS, the responsibility should be on the LLM company; the code should naturally inherit the OS license it is associated with. On the topic of LLM companies using OSS code inappropriately, the responsibility should again be on the LLM company. In both situations, courts will probably have opinions in the future, and LLM companies might consider adapting their use policies further.

redchrision@mastodon.social

Something in their policies like: you can use LLM-generated code however you please, but consider that it is trained on X, Y, Z, and it might not follow the policies of where you use it, and you are using it at your own responsibility, might help them out. But it is still on them if they train models on things they should not, and the LLMs further generate questionable code from a policy perspective.

sfoskett@techfieldday.net

@evan @cwebber @bkuhn @ossguy @richardfontana Ok I haven’t really heard people before you guys explain that to me. So I was wondering if it was possible that it couldn’t be licensed. Thanks.