I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools.

adingbatponder@fosstodon.org

@gabrielesvelto For fun I tried writing rust code with claude code. The code took an age to compile when it worked (do we call it build?). The project took months and so the code got large & was slow to build. Claude was able to refactor it (after it worked) to build 10 times faster. That is not mechanical as you mention... but was really challenging. Mechanical refactors it does 100 times better still of course, because it seds too yes, but it can check the new syntax & test build each change.

keithpjolley@discuss.systems

@csepp @gabrielesvelto tbf, in all likelyhood it wouldn't be `sed` that fails. it would be the inputs to `sed` that failed - garbage in, garbage out.

patricus@gts.posix.live

@gabrielesvelto not really, it is not on my computer.

crazyeddie@mastodon.social

@fourlastor @gabrielesvelto It's not a use sed or use LLM scenario here.

Sed isn't a refactoring tool. There are plenty of actual refactoring tools that don't use LLMs. I was using them before LLMs were invented and no, fucking sed isn't the same thing. I'm rather hoping that wasn't actually a serious comparison

Mechanical refactors are deterministic algorithms. If the conversation is about sticking AI in that it's probably nonsense and you can leave without fearing you'll miss anything

christopherkunz@chaos.social

@gabrielesvelto It's also Turing complete.

crazyeddie@mastodon.social

@gabrielesvelto @csepp I bet if you look at the C++ part of the tools there's not many refactors they can do

crazyeddie@mastodon.social

@csepp @gabrielesvelto Doesn't look like lua really has a good binding to libclang but if you used Python you could use the same libraries that clang-format/tidy do. They're using the actual llvm parser and give you an API to manipulate the AST.

pepperthevixen@meow.social

@gabrielesvelto "Yeah but Sed is old and shitty and you gotta get with the times" -some techbro somewhere

pepperthevixen@meow.social

@gabrielesvelto NGL when I read "mechanical refactoring", I first imagined a bunch of robot arms on an Aperture-esque assembly line rearranging letters on printing press-style blocks

gabrielesvelto@mas.to

@adingbatponder why did the project take so long to build?

gabrielesvelto@mas.to

@fourlastor you don't need to do anything special to be a target of state-sponsored actors if your rely on an LLM for your coding tasks. State-sponsored actors have almost certainly poisoned the training data of major commercial LLMs, you don't need to add anything yourself. Remember, these things are trained on anything that's dredged from the internet. *Anything*. Do you really trust what happens within the model? Remember the xz compromise? It can now be done automatically *at scale*.

gabrielesvelto@mas.to

I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.

adingbatponder@fosstodon.org

@gabrielesvelto Well that is what rust seems to be like. I used a lot of packages incl. browser and screen grabbing tools which took ages to build. Like 20 mins. (It was inside a nixos flake though.)

buermann@mastodon.social

@gabrielesvelto

Any blogger can poison the LLMs.

https://www.bbc.com/future/article/20260218-i-hacked-chatgpt-and-googles-ai-and-it-only-took-20-minutes

gabrielesvelto@mas.to

And it's crucial to remember what happened during the xz compromise: a chain of seemingly innocuous commits where malicious behavior was hidden, then triggered by changing a single character in a generated file. A SINGLE CHARACTER. If you truly believe you can catch that by manually reviewing thousands upon thousands of machine-generated commits obtained via black-box training data I'm sorry, but you're being extremely naive.

gabrielesvelto@mas.to

@adingbatponder yes, but why? Which packages where taking so long? Firefox has almost 4 millions of lines of Rust and it takes only a few minutes to build them.

a@852260996.91268476.xyz

@gabrielesvelto@mas.to it is also worth remembering that the xz incident happened WITHOUT LLMs involved, so you comparison is not a very good one

piegames@flausch.social

@gabrielesvelto "people are using this inadequate and problematic tool for a job, so let me suggest they use this different completely inadequate tool instead."
Speaking of unfortunate painful experience, using grap and sed at scale for mechanical refactoring very much randomly introduces mistakes into a codebase. I beg developers to use *at least* syntax-aware tools for mechanical refactoring jobs

gabrielesvelto@mas.to

@a how so? Now you don't need a person to run that particular exploit for years, you can just poison an LLM so that whenever someone generates a sufficiently large sequence of commits the exploit can be injected in them directly. No user intervention and it can be done at scale. And it can be done in closed-source codebases too, it's just a matter of someone using a bot on them.

a@852260996.91268476.xyz

@gabrielesvelto@mas.to you didn't need an LLM for xz, that is how