I've seen people claiming - with a straight face - that mechanical refactoring is a good use-case for LLM-based tools.

gabrielesvelto@mas.to

@fourlastor what about the time spent setting up the LLM, sandboxing it and then reviewing all the changes? What about the risk of the code containing prompt-injections that might be designed to introduce vulnerabilities or simply take over your machine or credentials for a state-sponsored attacker to use? What about the reliance on a single closed-source paid-for commercial tool? Those are a lot of disadvantages to make up for.

dain@hachyderm.io

@RandamuMaki @gabrielesvelto oh, that expression builder in the second step is really nice! wish it would then do match testing on more lines in further steps like how regex101 does

fourlastor@androiddev.social

@gabrielesvelto answering in order:

>what about the time spent setting up the LLM, sandboxing it and then reviewing all the changes?

This for what I'm working on is usually between 30 and 40 minutes, start to end (minus the time that the LLM takes to do its own work in its own git subtree, while I do other stuff). For context, claude doesn't commit, I review the changes locally (git is blacklisted). In my case this is been pretty stable on 100-150 tasks where I did the same kind of migration

fourlastor@androiddev.social

@gabrielesvelto prompt-injections

The project is closed source, and we don't have places where we randomly include text files, if someone IN THE COMPANY manages to introduce malicious code, imho they'd just infect gradle instead of hoping on someone running an LLM to trigger something (other than devs having access to only what they need). State sponsored hackers specifically are really not in my list of things I can defend from, be it from LLMs or whatever introduced attacks

fourlastor@androiddev.social

@gabrielesvelto What about the reliance on a single closed-source paid-for commercial tool

On this I 100% agree, you shouldn't RELY on it. I am confident that I can make the same changes myself (in some cases I did because it was clearly less time consuming than making an LLM do that), if tomorrow these tools disappear I am sure I will be comfortable working without them (as I do for example for my OSS/hobby work, where I can't really justify paying for the subscription)

gundersen@mastodon.social

@gabrielesvelto the developers of TypeScript have decided not to implement refactoring tools because the refactoring can be done by LLMs...

hipsterelectron@circumstances.run

@gabrielesvelto i spent the last week using sed to produce an entire module system for a prototype. lovely piece of software that expands the meaning of structured data. not at all perfect but if we're comparing it to statistical approaches it at least has the benefit of determinism

fourlastor@androiddev.social

@gabrielesvelto a counter example: one migration I needed to make was to migrate java serializable to parcelable. That was a GREAT candidate to be worked on by modifying the syntax tree. I created a small throw away plugin in intellij which did the work, removed the extension, added the annotation and ran on thousands of files in a few seconds.Imho trying to find the most appropiate tool for the task at hand is important, and having an all-or-nothing mentality (on either side) isn't constructive

adingbatponder@fosstodon.org

@gabrielesvelto For fun I tried writing rust code with claude code. The code took an age to compile when it worked (do we call it build?). The project took months and so the code got large & was slow to build. Claude was able to refactor it (after it worked) to build 10 times faster. That is not mechanical as you mention... but was really challenging. Mechanical refactors it does 100 times better still of course, because it seds too yes, but it can check the new syntax & test build each change.

keithpjolley@discuss.systems

@csepp @gabrielesvelto tbf, in all likelyhood it wouldn't be `sed` that fails. it would be the inputs to `sed` that failed - garbage in, garbage out.

patricus@gts.posix.live

@gabrielesvelto not really, it is not on my computer.

crazyeddie@mastodon.social

@fourlastor @gabrielesvelto It's not a use sed or use LLM scenario here.

Sed isn't a refactoring tool. There are plenty of actual refactoring tools that don't use LLMs. I was using them before LLMs were invented and no, fucking sed isn't the same thing. I'm rather hoping that wasn't actually a serious comparison

Mechanical refactors are deterministic algorithms. If the conversation is about sticking AI in that it's probably nonsense and you can leave without fearing you'll miss anything

christopherkunz@chaos.social

@gabrielesvelto It's also Turing complete.

crazyeddie@mastodon.social

@gabrielesvelto @csepp I bet if you look at the C++ part of the tools there's not many refactors they can do

crazyeddie@mastodon.social

@csepp @gabrielesvelto Doesn't look like lua really has a good binding to libclang but if you used Python you could use the same libraries that clang-format/tidy do. They're using the actual llvm parser and give you an API to manipulate the AST.

pepperthevixen@meow.social

@gabrielesvelto "Yeah but Sed is old and shitty and you gotta get with the times" -some techbro somewhere

pepperthevixen@meow.social

@gabrielesvelto NGL when I read "mechanical refactoring", I first imagined a bunch of robot arms on an Aperture-esque assembly line rearranging letters on printing press-style blocks

gabrielesvelto@mas.to

@adingbatponder why did the project take so long to build?

gabrielesvelto@mas.to

@fourlastor you don't need to do anything special to be a target of state-sponsored actors if your rely on an LLM for your coding tasks. State-sponsored actors have almost certainly poisoned the training data of major commercial LLMs, you don't need to add anything yourself. Remember, these things are trained on anything that's dredged from the internet. *Anything*. Do you really trust what happens within the model? Remember the xz compromise? It can now be done automatically *at scale*.

gabrielesvelto@mas.to

I think there's an important clarification to be made about LLM usage in coding tasks: do you trust the training data? Not your inputs, those are irrelevant, I mean the junk that the major vendors have dredged from the internet. Because I'm 100% positive that any self-respecting state-sponsored actor is poisoning training data as we speak by... simply publishing stuff on the internet.