Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. all the criticism has been said, all the takes been had.

all the criticism has been said, all the takes been had.

Planlagt Fastgjort Låst Flyttet Ikke-kategoriseret
110 Indlæg 60 Posters 386 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • poleguy@mastodon.socialP poleguy@mastodon.social

    @jonny I just lost my beer league hockey championship as the last shooter on a 14 round shoot out. I'm sitting in my driveway reading your thread. I'll need to read it again in the morning.

    I don't remember why I followed you originally. But I love this thread.

    This whole rsync thing is the most interesting thing that has come out of the ai bubble.

    I had a negative feel for rsync after years ago reading a blog criticizing its sloppy design.

    Yet I rely on it daily. I have so many questions.

    bms48@mastodon.socialB This user is from outside of this forum
    bms48@mastodon.socialB This user is from outside of this forum
    bms48@mastodon.social
    wrote sidst redigeret af
    #97

    @poleguy @jonny rsync is the fallback method for zxfer, a FreeBSD script-based tool for replicating ZFS datasets over the network; normally it calls out to zfs send/recv to do the hard work of shuffling deltas. I don't use it myself; I'm currently using a 3rd party backup agent with SFTP backing for Win11 clients but not rsync itself.

    1 Reply Last reply
    0
    • jonny@neuromatch.socialJ jonny@neuromatch.social

      But its not just as simple as "OK if I read the tests I should be fine" because LLM code is often untestable. It writes code with function and class names that make it seem like a something does something, but they might just be flat wrong. Or there is some invisible fallback condition the LLM encountered while generating code and added to just make tests pass, but has entirely different behavior.

      If you've watched an LLM generate a project over time, you see it generating its own private language, and ive even seen it reinvent language features like function definitions themselves. Its names form part of an increasingly inaccessible web of meaning that no human can penetrate.

      Writing tests requires a kind of "information gap" where you can have enough intuition about what something does, but not how it does it, so you can a) know what it should do, b) make a strong assertion about that expectation, c) without mirroring the internal implementation's limits. That's hard! And really only possible when the foundation, (a) is true. Code must have an articulable purpose in order to be testable, that's tautological, that defines what failure is. But since LLM code increasingly detaches from any kind of stable description or expectation, even if the tests look very rigorous, you can't know if they are just tailored to the specific internal details of its function to eke out a pass, because it's hard to know what it should do anyway.

      So really you have to read the test code, the code under test, and also all the other code that might call the code under test. Aka you have to read everything. And rather than reading something that was written to be read, you're wading through a slop swamp. So you can't. It takes more time than just writing it. The erosion of testing is just an intrinsic part of the loop that you can't escape without breaking the spell of the slot machine, and it is what drives the loop.

      bms48@mastodon.socialB This user is from outside of this forum
      bms48@mastodon.socialB This user is from outside of this forum
      bms48@mastodon.social
      wrote sidst redigeret af
      #98

      @jonny I suspect the "information gap" you mention here corresponds to known LLM kryptonite: they can't emulate human reasoning in the abductive mode (to best outcome or expectation, usually inherently non-boolean, and the domain of GOFAI expert systems). Human decisions about refactoring legacy code bases (Chesterton's Fence) might constitute an example of problem solving requiring abductive reasoning. I published https://burdentennis.com yesterday but didn't put my name to it directly.

      bms48@mastodon.socialB 1 Reply Last reply
      0
      • bms48@mastodon.socialB bms48@mastodon.social

        @jonny I suspect the "information gap" you mention here corresponds to known LLM kryptonite: they can't emulate human reasoning in the abductive mode (to best outcome or expectation, usually inherently non-boolean, and the domain of GOFAI expert systems). Human decisions about refactoring legacy code bases (Chesterton's Fence) might constitute an example of problem solving requiring abductive reasoning. I published https://burdentennis.com yesterday but didn't put my name to it directly.

        bms48@mastodon.socialB This user is from outside of this forum
        bms48@mastodon.socialB This user is from outside of this forum
        bms48@mastodon.social
        wrote sidst redigeret af
        #99

        @jonny And I found a solid refutation of Cartesian dualism just now from the Wikipedia article for category error: "The Concept of Mind" by Gilbert Ryle. That's aiming at Dawkins' contention Claude is conscious (very unlikely). @mattsheffield was driving at this. Humans can be really stupid as Carlo Cipolla points out. Even when LLM boosters ignoring Searle get shut down by Hitchens' razor the burden tennis recurs. But empirical refutation is starting to emerge as the con is about to be closed.

        1 Reply Last reply
        0
        • jonny@neuromatch.socialJ jonny@neuromatch.social

          @elebertus
          Ive read so much LLM code at this point, there are still patterns that are present but elude my understanding, but one thing that's clear is that there are foundational flaw categories that are not improved upon by model version and appear in wildly different projects using wildly different models and harnesses. Testing is a big nexus of those flaws. I am not close to what would be a satisfying explanation of the dynamics, but every project suffers fucked testing problems.

          kris@todon.euK This user is from outside of this forum
          kris@todon.euK This user is from outside of this forum
          kris@todon.eu
          wrote sidst redigeret af
          #100

          @jonny @elebertus

          I have seen specifically test versions of different objects/structs that are slightly modified copies of other structs, and only (or mostly!) the testing version is used in the tests.

          jonny@neuromatch.socialJ 1 Reply Last reply
          0
          • jonny@neuromatch.socialJ jonny@neuromatch.social

            So rsync rewriting all the tests puts the entire project in play. Now the entire protective surface has been sloshed through a layer of probability, so the loop must accelerate. Followup PRs add more carveouts with lengthy LLM justifications that sound perfectly plausible but amount to an erosion of the protective surface. We go from cumulative improvement to a random walk.

            ra@mstdn.socialR This user is from outside of this forum
            ra@mstdn.socialR This user is from outside of this forum
            ra@mstdn.social
            wrote sidst redigeret af
            #101

            Sorry to walk into this thread without bringing anything, but what does 'PR' mean; Production....? I keep seeing it in other convos about code still and can't work it out. TIA.

            lightbeaminsight@mastodon.socialL 1 Reply Last reply
            0
            • fluffy@plush.cityF fluffy@plush.city

              @jonny ... and why the everloving FUCK do these tests run as root

              d_rift@beige.partyD This user is from outside of this forum
              d_rift@beige.partyD This user is from outside of this forum
              d_rift@beige.party
              wrote sidst redigeret af
              #102

              @fluffy @jonny .. they WHAT?!

              d_rift@beige.partyD 1 Reply Last reply
              0
              • ra@mstdn.socialR ra@mstdn.social

                Sorry to walk into this thread without bringing anything, but what does 'PR' mean; Production....? I keep seeing it in other convos about code still and can't work it out. TIA.

                lightbeaminsight@mastodon.socialL This user is from outside of this forum
                lightbeaminsight@mastodon.socialL This user is from outside of this forum
                lightbeaminsight@mastodon.social
                wrote sidst redigeret af
                #103

                @Ra “pull request”. The new code sits outside of the main code, and in order to add new code you raise a request to pull it in. This then gives you the option to review the changes, deletions and additions are highlighted for the reviewer of the pull request.

                Usually code isn’t brought into the main codebase until someone has reviewed the changes, and verified things work as intended.

                (LLMs typically generate so many changes, and so much code, that this becomes very difficult)

                1 Reply Last reply
                0
                • d_rift@beige.partyD d_rift@beige.party

                  @fluffy @jonny .. they WHAT?!

                  d_rift@beige.partyD This user is from outside of this forum
                  d_rift@beige.partyD This user is from outside of this forum
                  d_rift@beige.party
                  wrote sidst redigeret af
                  #104

                  @fluffy @jonny omfgs. They did.

                  paul@notnull.spaceP 1 Reply Last reply
                  0
                  • kris@todon.euK kris@todon.eu

                    @jonny @elebertus

                    I have seen specifically test versions of different objects/structs that are slightly modified copies of other structs, and only (or mostly!) the testing version is used in the tests.

                    jonny@neuromatch.socialJ This user is from outside of this forum
                    jonny@neuromatch.socialJ This user is from outside of this forum
                    jonny@neuromatch.social
                    wrote sidst redigeret af
                    #105

                    @kris
                    @elebertus
                    "Code only used in the tests" is another LLM favorite

                    1 Reply Last reply
                    0
                    • fluffy@plush.cityF fluffy@plush.city

                      @jonny also why the hell would they write tests for a C program/library in Python? It makes no sense.

                      0x2ba22e11@unstable.systems0 This user is from outside of this forum
                      0x2ba22e11@unstable.systems0 This user is from outside of this forum
                      0x2ba22e11@unstable.systems
                      wrote sidst redigeret af
                      #106

                      @fluffy @jonny I think in the abstract Python tests for a C project could be fine because tests usually contain a lot of setup and assertion code that is run only once, so running it with a slow interpreter is cheaper than compiling it then running it. You can bind to C libraries with cffi. You get to use the hypothesis library, which is really nice. Automatic memory management makes the tests shorter.

                      The one big obvious downside is that Python alway used to throw valgrind diagnostics from its GC doing a "clever" trick with uninitialised memory (I'm not sure if they fixed this since), which necessitates adding a suppression to valgrind options.

                      Edit to add: but in this specific context I would be surprised if any of that was the reason, lol. 🙃

                      bms48@mastodon.socialB 1 Reply Last reply
                      0
                      • 0x2ba22e11@unstable.systems0 0x2ba22e11@unstable.systems

                        @fluffy @jonny I think in the abstract Python tests for a C project could be fine because tests usually contain a lot of setup and assertion code that is run only once, so running it with a slow interpreter is cheaper than compiling it then running it. You can bind to C libraries with cffi. You get to use the hypothesis library, which is really nice. Automatic memory management makes the tests shorter.

                        The one big obvious downside is that Python alway used to throw valgrind diagnostics from its GC doing a "clever" trick with uninitialised memory (I'm not sure if they fixed this since), which necessitates adding a suppression to valgrind options.

                        Edit to add: but in this specific context I would be surprised if any of that was the reason, lol. 🙃

                        bms48@mastodon.socialB This user is from outside of this forum
                        bms48@mastodon.socialB This user is from outside of this forum
                        bms48@mastodon.social
                        wrote sidst redigeret af
                        #107

                        @0x2ba22e11 @fluffy @jonny "hypothesis" made my hitlist for Python software testing tools on 2026-05-01 for future work, along with pytest, unittest, and nose. ruff for linting and bandit for static analysis. Much of the bashism could still be replaced with pexpect. CPython itself supports DTrace/Systemtap uSDT instrumentation now of the interpreter itself. My preference for ffi is Cython for cffi for "the reasons"... but capturing e.g. cBPF in libpcap correctly took some figuring out.

                        1 Reply Last reply
                        0
                        • jonny@neuromatch.socialJ jonny@neuromatch.social

                          Here's an example from some code that was thrust at me this week. The rest of the tests try a bit harder to look like tests, but this one is perplexing.

                          What does it test? The function name suggests its a smoke test. LLMs love to call things smoke tests. That would suggest this would be an early-run test that fails loudly if some basic precondition - like having ffmpeg - fails. Or, I guess we are smoke testing the ensure_ffmpeg function? Anyway who knows. However we first check if ffmpeg or ffprobe are present, which is exactly what ensure_ffmpeg does. If they aren't present, a warning tells us that ffmpeg/ffprobe are required for the video tests, which makes it seem like this should be a parameterizing test that controls which tests are run, which of course it does not do.

                          So the test literally does nothing and cannot possibly fail, but says it does at least two things, because to an LLM something saying it does something is the same thing as it actually doing that thing.

                          philsalkie@mindly.socialP This user is from outside of this forum
                          philsalkie@mindly.socialP This user is from outside of this forum
                          philsalkie@mindly.social
                          wrote sidst redigeret af
                          #108

                          @jonny

                          Item 1 in my lecture on Securing Industrial Controls Communications:

                          "Just because it works doesn't mean it's right."

                          "Works" is a multi-axis problem space, and any complex system will never have all the axes of "works" at 100%

                          For example:

                          Does it do what I want with good input?
                          Does it do something safe with bad input?
                          Does it run on my system?
                          Does it run on someone else's system?
                          Will it keep working if things change?

                          And so on and so on...

                          1 Reply Last reply
                          0
                          • d_rift@beige.partyD d_rift@beige.party

                            @fluffy @jonny omfgs. They did.

                            paul@notnull.spaceP This user is from outside of this forum
                            paul@notnull.spaceP This user is from outside of this forum
                            paul@notnull.space
                            wrote sidst redigeret af
                            #109

                            @d_rift @fluffy @jonny no fucking way 🤣🤣🤣

                            d_rift@beige.partyD 1 Reply Last reply
                            0
                            • paul@notnull.spaceP paul@notnull.space

                              @d_rift @fluffy @jonny no fucking way 🤣🤣🤣

                              d_rift@beige.partyD This user is from outside of this forum
                              d_rift@beige.partyD This user is from outside of this forum
                              d_rift@beige.party
                              wrote sidst redigeret af
                              #110

                              @paul @fluffy @jonny On every platform. Individually. And like, if you're reading the PR like a reviewer, it happens fairly early... well before "you modified how many files" fatigue sets in. So I can only assume either someone thought this was OK or they never read it. Given human nature, I can definitely assume the human operator ("author" would be a stretch) _ran_ it before having read it.

                              1 Reply Last reply
                              0
                              • jwcph@helvede.netJ jwcph@helvede.net shared this topic
                              Svar
                              • Svar som emne
                              Login for at svare
                              • Ældste til nyeste
                              • Nyeste til ældste
                              • Most Votes


                              • Log ind

                              • Har du ikke en konto? Tilmeld

                              • Login or register to search.
                              Powered by NodeBB Contributors
                              Graciously hosted by data.coop
                              • First post
                                Last post
                              0
                              • Hjem
                              • Seneste
                              • Etiketter
                              • Populære
                              • Verden
                              • Bruger
                              • Grupper