Skip to content
  • Hjem
  • Seneste
  • Etiketter
  • Populære
  • Verden
  • Bruger
  • Grupper
Temaer
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Kollaps
FARVEL BIG TECH
  1. Forside
  2. Ikke-kategoriseret
  3. all the criticism has been said, all the takes been had.

all the criticism has been said, all the takes been had.

Planlagt Fastgjort Låst Flyttet Ikke-kategoriseret
110 Indlæg 60 Posters 384 Visninger
  • Ældste til nyeste
  • Nyeste til ældste
  • Most Votes
Svar
  • Svar som emne
Login for at svare
Denne tråd er blevet slettet. Kun brugere med emne behandlings privilegier kan se den.
  • glassresistor@sfba.socialG glassresistor@sfba.social

    @jonny so this thread could be a case study

    https://social.losno.co/@chris/116655930139554496

    jonny@neuromatch.socialJ This user is from outside of this forum
    jonny@neuromatch.socialJ This user is from outside of this forum
    jonny@neuromatch.social
    wrote sidst redigeret af
    #40

    @glassresistor
    Yeah I saw that and got sad

    1 Reply Last reply
    0
    • jonny@neuromatch.socialJ jonny@neuromatch.social

      i love gambling. i have used "AI" extensively. it feels the same.

      jonny@neuromatch.socialJ This user is from outside of this forum
      jonny@neuromatch.socialJ This user is from outside of this forum
      jonny@neuromatch.social
      wrote sidst redigeret af
      #41

      So, look. One shot rewriting the whole test suite in another language is probably not great to do, but what happened here is so much worse than you are expecting.

      https://github.com/RsyncProject/rsync/pull/903/

      This does not "translate tests into pytest" or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to "run rsync and get the results" - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function. So like now rsync needs a test suite for its test suite.

      If instead of telling an LLM to "rewrite the tests in python" you just searched "python testing" you would find the pytest docs. And then you would find examples. And then you could write fixtures to deduplicate all the prior shell script setup and teardown stuff, and so on. But since it was just "rewrite the tests in python" its now worse than before, and the odds of the rewrite actually being a 100% faithful translation are close to 0.

      arrjay@tacobelllabs.netA ricci@discuss.systemsR jonny@neuromatch.socialJ aud@fire.asta.lgbtA eliocamp@mastodon.socialE 6 Replies Last reply
      0
      • jonny@neuromatch.socialJ jonny@neuromatch.social

        So, look. One shot rewriting the whole test suite in another language is probably not great to do, but what happened here is so much worse than you are expecting.

        https://github.com/RsyncProject/rsync/pull/903/

        This does not "translate tests into pytest" or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to "run rsync and get the results" - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function. So like now rsync needs a test suite for its test suite.

        If instead of telling an LLM to "rewrite the tests in python" you just searched "python testing" you would find the pytest docs. And then you would find examples. And then you could write fixtures to deduplicate all the prior shell script setup and teardown stuff, and so on. But since it was just "rewrite the tests in python" its now worse than before, and the odds of the rewrite actually being a 100% faithful translation are close to 0.

        arrjay@tacobelllabs.netA This user is from outside of this forum
        arrjay@tacobelllabs.netA This user is from outside of this forum
        arrjay@tacobelllabs.net
        wrote sidst redigeret af
        #42

        @jonny *screaming intensifies*

        1 Reply Last reply
        0
        • jonny@neuromatch.socialJ jonny@neuromatch.social

          So, look. One shot rewriting the whole test suite in another language is probably not great to do, but what happened here is so much worse than you are expecting.

          https://github.com/RsyncProject/rsync/pull/903/

          This does not "translate tests into pytest" or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to "run rsync and get the results" - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function. So like now rsync needs a test suite for its test suite.

          If instead of telling an LLM to "rewrite the tests in python" you just searched "python testing" you would find the pytest docs. And then you would find examples. And then you could write fixtures to deduplicate all the prior shell script setup and teardown stuff, and so on. But since it was just "rewrite the tests in python" its now worse than before, and the odds of the rewrite actually being a 100% faithful translation are close to 0.

          ricci@discuss.systemsR This user is from outside of this forum
          ricci@discuss.systemsR This user is from outside of this forum
          ricci@discuss.systems
          wrote sidst redigeret af
          #43

          @jonny 🎰

          1 Reply Last reply
          0
          • jonny@neuromatch.socialJ jonny@neuromatch.social

            So, look. One shot rewriting the whole test suite in another language is probably not great to do, but what happened here is so much worse than you are expecting.

            https://github.com/RsyncProject/rsync/pull/903/

            This does not "translate tests into pytest" or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to "run rsync and get the results" - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function. So like now rsync needs a test suite for its test suite.

            If instead of telling an LLM to "rewrite the tests in python" you just searched "python testing" you would find the pytest docs. And then you would find examples. And then you could write fixtures to deduplicate all the prior shell script setup and teardown stuff, and so on. But since it was just "rewrite the tests in python" its now worse than before, and the odds of the rewrite actually being a 100% faithful translation are close to 0.

            jonny@neuromatch.socialJ This user is from outside of this forum
            jonny@neuromatch.socialJ This user is from outside of this forum
            jonny@neuromatch.social
            wrote sidst redigeret af
            #44

            I think the modal situation here is that the people are reading none or very little of what is being generated by the LLM, so the tests have a special role: Tests function as the pull arm on the slot machine, you just generate until tests pass, and that's a jackpot. Obviously that's meaningless when the tests are meaningless, so tests take on a very different meaning and role in slot machine coding.

            Previously we would write careful test conditions that were based off some real problem or an understanding of what the code under test did, and had a specific thing they were intended to protect against. Tests move slow and are designed to protect us against the things we know can go wrong. When we learn of a new wrong thing, we add a test.

            LLM tests have the form of tests but don't do the same thing. They often test nothing, and are just expressions of truisms that the probabilistic text space explored while generating. They have strongly worded names but end up actually asserting that basic language features work as expected. Because it is not us writing tests for ourselves, where we only harm ourselves by making them weak, they function instead as a passively obfuscated justification for the code that the LLM generates. The user wants the tests to pass. The LLM provides.

            The tests are theater: they are the play field for the slot machine. They are mild, surmountable, need to fail a few times to be plausible, but must eventually pass within the expected generation loop window to deliver the payout.

            peterrenshaw@ioc.exchangeP jonny@neuromatch.socialJ jens@social.finkhaeuser.deJ dahukanna@mastodon.socialD synlogic4242@social.vivaldi.netS 5 Replies Last reply
            0
            • jonny@neuromatch.socialJ jonny@neuromatch.social

              So, look. One shot rewriting the whole test suite in another language is probably not great to do, but what happened here is so much worse than you are expecting.

              https://github.com/RsyncProject/rsync/pull/903/

              This does not "translate tests into pytest" or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to "run rsync and get the results" - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function. So like now rsync needs a test suite for its test suite.

              If instead of telling an LLM to "rewrite the tests in python" you just searched "python testing" you would find the pytest docs. And then you would find examples. And then you could write fixtures to deduplicate all the prior shell script setup and teardown stuff, and so on. But since it was just "rewrite the tests in python" its now worse than before, and the odds of the rewrite actually being a 100% faithful translation are close to 0.

              aud@fire.asta.lgbtA This user is from outside of this forum
              aud@fire.asta.lgbtA This user is from outside of this forum
              aud@fire.asta.lgbt
              wrote sidst redigeret af
              #45

              @jonny@neuromatch.social oh holy FUCK

              it is so bad

              1 Reply Last reply
              0
              • jonny@neuromatch.socialJ jonny@neuromatch.social

                I think the modal situation here is that the people are reading none or very little of what is being generated by the LLM, so the tests have a special role: Tests function as the pull arm on the slot machine, you just generate until tests pass, and that's a jackpot. Obviously that's meaningless when the tests are meaningless, so tests take on a very different meaning and role in slot machine coding.

                Previously we would write careful test conditions that were based off some real problem or an understanding of what the code under test did, and had a specific thing they were intended to protect against. Tests move slow and are designed to protect us against the things we know can go wrong. When we learn of a new wrong thing, we add a test.

                LLM tests have the form of tests but don't do the same thing. They often test nothing, and are just expressions of truisms that the probabilistic text space explored while generating. They have strongly worded names but end up actually asserting that basic language features work as expected. Because it is not us writing tests for ourselves, where we only harm ourselves by making them weak, they function instead as a passively obfuscated justification for the code that the LLM generates. The user wants the tests to pass. The LLM provides.

                The tests are theater: they are the play field for the slot machine. They are mild, surmountable, need to fail a few times to be plausible, but must eventually pass within the expected generation loop window to deliver the payout.

                peterrenshaw@ioc.exchangeP This user is from outside of this forum
                peterrenshaw@ioc.exchangeP This user is from outside of this forum
                peterrenshaw@ioc.exchange
                wrote sidst redigeret af
                #46

                @jonny “tests have a special role” ☺️

                1 Reply Last reply
                0
                • jonny@neuromatch.socialJ jonny@neuromatch.social

                  I think the modal situation here is that the people are reading none or very little of what is being generated by the LLM, so the tests have a special role: Tests function as the pull arm on the slot machine, you just generate until tests pass, and that's a jackpot. Obviously that's meaningless when the tests are meaningless, so tests take on a very different meaning and role in slot machine coding.

                  Previously we would write careful test conditions that were based off some real problem or an understanding of what the code under test did, and had a specific thing they were intended to protect against. Tests move slow and are designed to protect us against the things we know can go wrong. When we learn of a new wrong thing, we add a test.

                  LLM tests have the form of tests but don't do the same thing. They often test nothing, and are just expressions of truisms that the probabilistic text space explored while generating. They have strongly worded names but end up actually asserting that basic language features work as expected. Because it is not us writing tests for ourselves, where we only harm ourselves by making them weak, they function instead as a passively obfuscated justification for the code that the LLM generates. The user wants the tests to pass. The LLM provides.

                  The tests are theater: they are the play field for the slot machine. They are mild, surmountable, need to fail a few times to be plausible, but must eventually pass within the expected generation loop window to deliver the payout.

                  jonny@neuromatch.socialJ This user is from outside of this forum
                  jonny@neuromatch.socialJ This user is from outside of this forum
                  jonny@neuromatch.social
                  wrote sidst redigeret af
                  #47

                  Here's an example from some code that was thrust at me this week. The rest of the tests try a bit harder to look like tests, but this one is perplexing.

                  What does it test? The function name suggests its a smoke test. LLMs love to call things smoke tests. That would suggest this would be an early-run test that fails loudly if some basic precondition - like having ffmpeg - fails. Or, I guess we are smoke testing the ensure_ffmpeg function? Anyway who knows. However we first check if ffmpeg or ffprobe are present, which is exactly what ensure_ffmpeg does. If they aren't present, a warning tells us that ffmpeg/ffprobe are required for the video tests, which makes it seem like this should be a parameterizing test that controls which tests are run, which of course it does not do.

                  So the test literally does nothing and cannot possibly fail, but says it does at least two things, because to an LLM something saying it does something is the same thing as it actually doing that thing.

                  bms48@mastodon.socialB jonny@neuromatch.socialJ bstacey@icosahedron.websiteB gunchleoc@mastodon.scotG henryk@chaos.socialH 7 Replies Last reply
                  0
                  • jonny@neuromatch.socialJ jonny@neuromatch.social

                    RE: https://hails.org/@hailey/116657391001259044

                    all the criticism has been said, all the takes been had. the only metaphor i have been finding consistently useful for understanding what is happening with people and "AI" is addiction, and specifically gambling addiction.

                    knutson_brain@sfba.socialK This user is from outside of this forum
                    knutson_brain@sfba.socialK This user is from outside of this forum
                    knutson_brain@sfba.social
                    wrote sidst redigeret af
                    #48

                    @jonny
                    The model is metastatic…

                    1 Reply Last reply
                    0
                    • jonny@neuromatch.socialJ jonny@neuromatch.social

                      Here's an example from some code that was thrust at me this week. The rest of the tests try a bit harder to look like tests, but this one is perplexing.

                      What does it test? The function name suggests its a smoke test. LLMs love to call things smoke tests. That would suggest this would be an early-run test that fails loudly if some basic precondition - like having ffmpeg - fails. Or, I guess we are smoke testing the ensure_ffmpeg function? Anyway who knows. However we first check if ffmpeg or ffprobe are present, which is exactly what ensure_ffmpeg does. If they aren't present, a warning tells us that ffmpeg/ffprobe are required for the video tests, which makes it seem like this should be a parameterizing test that controls which tests are run, which of course it does not do.

                      So the test literally does nothing and cannot possibly fail, but says it does at least two things, because to an LLM something saying it does something is the same thing as it actually doing that thing.

                      bms48@mastodon.socialB This user is from outside of this forum
                      bms48@mastodon.socialB This user is from outside of this forum
                      bms48@mastodon.social
                      wrote sidst redigeret af
                      #49

                      @jonny I have seen this pattern of which you speak when attempting to use LLMs to compare TCP Delayed-ACK implementations between BSD derived code bases. They generated output suggesting semantics that just weren't there, presumably based on how similarly named things were between each fork, but this was not obvious without reading the source for oneself in context. This went doubly for FreeBSD where there are multiple TCP functional blocks ("stacks").

                      bms48@mastodon.socialB 1 Reply Last reply
                      0
                      • jonny@neuromatch.socialJ jonny@neuromatch.social

                        Here's an example from some code that was thrust at me this week. The rest of the tests try a bit harder to look like tests, but this one is perplexing.

                        What does it test? The function name suggests its a smoke test. LLMs love to call things smoke tests. That would suggest this would be an early-run test that fails loudly if some basic precondition - like having ffmpeg - fails. Or, I guess we are smoke testing the ensure_ffmpeg function? Anyway who knows. However we first check if ffmpeg or ffprobe are present, which is exactly what ensure_ffmpeg does. If they aren't present, a warning tells us that ffmpeg/ffprobe are required for the video tests, which makes it seem like this should be a parameterizing test that controls which tests are run, which of course it does not do.

                        So the test literally does nothing and cannot possibly fail, but says it does at least two things, because to an LLM something saying it does something is the same thing as it actually doing that thing.

                        jonny@neuromatch.socialJ This user is from outside of this forum
                        jonny@neuromatch.socialJ This user is from outside of this forum
                        jonny@neuromatch.social
                        wrote sidst redigeret af
                        #50

                        To a person, the whole purpose of the test is for it to fail when it should. That's an elemental part of writing good tests: they must fail before the patch, or else they provide no protection. We want protection from failure, that is good for us. We need tests to protect us because we can't possibly evaluate all the other parts of a complex system when we try to fix one part of it.

                        LLM slot machines change what tests mean - of course we still want the code to work good, but if we're not evaluating the code or the tests, then what the slot machine turns them into is just a high score and the jackpot condition. 130 new tests added, that means its good. They pass, that means I win.

                        The bugfix loop with LLMs defeats the purpose of automated tests and renders it no better than manual testing: you notice a bug, you yell at the LLM to fix it, you keep looking at the specific thing that's broken until its fixed, good robot, ship it. The changes don't have meaningful tests, and nothing else does either, so the slot machine loop repeats, bug->fix->win. Very velocity. Rocket fuel even.

                        jonny@neuromatch.socialJ pandabutter@plush.cityP 2 Replies Last reply
                        0
                        • jonny@neuromatch.socialJ jonny@neuromatch.social

                          RE: https://hails.org/@hailey/116657391001259044

                          all the criticism has been said, all the takes been had. the only metaphor i have been finding consistently useful for understanding what is happening with people and "AI" is addiction, and specifically gambling addiction.

                          murodegrizeco@toad.socialM This user is from outside of this forum
                          murodegrizeco@toad.socialM This user is from outside of this forum
                          murodegrizeco@toad.social
                          wrote sidst redigeret af
                          #51

                          @jonny

                          Huh, yeah, gambling works as a metaphor!

                          I was going more with ...

                          "This person came back from the alien ship, they had a grabby thing on their face but it fell off, and now the person is weirdly hungry, and maybe we should worry about what's growing inside them."

                          1 Reply Last reply
                          0
                          • jonny@neuromatch.socialJ jonny@neuromatch.social

                            To a person, the whole purpose of the test is for it to fail when it should. That's an elemental part of writing good tests: they must fail before the patch, or else they provide no protection. We want protection from failure, that is good for us. We need tests to protect us because we can't possibly evaluate all the other parts of a complex system when we try to fix one part of it.

                            LLM slot machines change what tests mean - of course we still want the code to work good, but if we're not evaluating the code or the tests, then what the slot machine turns them into is just a high score and the jackpot condition. 130 new tests added, that means its good. They pass, that means I win.

                            The bugfix loop with LLMs defeats the purpose of automated tests and renders it no better than manual testing: you notice a bug, you yell at the LLM to fix it, you keep looking at the specific thing that's broken until its fixed, good robot, ship it. The changes don't have meaningful tests, and nothing else does either, so the slot machine loop repeats, bug->fix->win. Very velocity. Rocket fuel even.

                            jonny@neuromatch.socialJ This user is from outside of this forum
                            jonny@neuromatch.socialJ This user is from outside of this forum
                            jonny@neuromatch.social
                            wrote sidst redigeret af
                            #52

                            But its not just as simple as "OK if I read the tests I should be fine" because LLM code is often untestable. It writes code with function and class names that make it seem like a something does something, but they might just be flat wrong. Or there is some invisible fallback condition the LLM encountered while generating code and added to just make tests pass, but has entirely different behavior.

                            If you've watched an LLM generate a project over time, you see it generating its own private language, and ive even seen it reinvent language features like function definitions themselves. Its names form part of an increasingly inaccessible web of meaning that no human can penetrate.

                            Writing tests requires a kind of "information gap" where you can have enough intuition about what something does, but not how it does it, so you can a) know what it should do, b) make a strong assertion about that expectation, c) without mirroring the internal implementation's limits. That's hard! And really only possible when the foundation, (a) is true. Code must have an articulable purpose in order to be testable, that's tautological, that defines what failure is. But since LLM code increasingly detaches from any kind of stable description or expectation, even if the tests look very rigorous, you can't know if they are just tailored to the specific internal details of its function to eke out a pass, because it's hard to know what it should do anyway.

                            So really you have to read the test code, the code under test, and also all the other code that might call the code under test. Aka you have to read everything. And rather than reading something that was written to be read, you're wading through a slop swamp. So you can't. It takes more time than just writing it. The erosion of testing is just an intrinsic part of the loop that you can't escape without breaking the spell of the slot machine, and it is what drives the loop.

                            jonny@neuromatch.socialJ bms48@mastodon.socialB 2 Replies Last reply
                            0
                            • jonny@neuromatch.socialJ This user is from outside of this forum
                              jonny@neuromatch.socialJ This user is from outside of this forum
                              jonny@neuromatch.social
                              wrote sidst redigeret af
                              #53

                              @elebertus
                              I think bun's rust rewrite is the single largest high profile yeet I have ever seen, if you haven't seen that yet

                              1 Reply Last reply
                              0
                              • jonny@neuromatch.socialJ jonny@neuromatch.social

                                But its not just as simple as "OK if I read the tests I should be fine" because LLM code is often untestable. It writes code with function and class names that make it seem like a something does something, but they might just be flat wrong. Or there is some invisible fallback condition the LLM encountered while generating code and added to just make tests pass, but has entirely different behavior.

                                If you've watched an LLM generate a project over time, you see it generating its own private language, and ive even seen it reinvent language features like function definitions themselves. Its names form part of an increasingly inaccessible web of meaning that no human can penetrate.

                                Writing tests requires a kind of "information gap" where you can have enough intuition about what something does, but not how it does it, so you can a) know what it should do, b) make a strong assertion about that expectation, c) without mirroring the internal implementation's limits. That's hard! And really only possible when the foundation, (a) is true. Code must have an articulable purpose in order to be testable, that's tautological, that defines what failure is. But since LLM code increasingly detaches from any kind of stable description or expectation, even if the tests look very rigorous, you can't know if they are just tailored to the specific internal details of its function to eke out a pass, because it's hard to know what it should do anyway.

                                So really you have to read the test code, the code under test, and also all the other code that might call the code under test. Aka you have to read everything. And rather than reading something that was written to be read, you're wading through a slop swamp. So you can't. It takes more time than just writing it. The erosion of testing is just an intrinsic part of the loop that you can't escape without breaking the spell of the slot machine, and it is what drives the loop.

                                jonny@neuromatch.socialJ This user is from outside of this forum
                                jonny@neuromatch.socialJ This user is from outside of this forum
                                jonny@neuromatch.social
                                wrote sidst redigeret af
                                #54

                                So rsync rewriting all the tests puts the entire project in play. Now the entire protective surface has been sloshed through a layer of probability, so the loop must accelerate. Followup PRs add more carveouts with lengthy LLM justifications that sound perfectly plausible but amount to an erosion of the protective surface. We go from cumulative improvement to a random walk.

                                poleguy@mastodon.socialP themipper@mastodon.socialT fluffy@plush.cityF ra@mstdn.socialR 4 Replies Last reply
                                1
                                0
                                • jonny@neuromatch.socialJ This user is from outside of this forum
                                  jonny@neuromatch.socialJ This user is from outside of this forum
                                  jonny@neuromatch.social
                                  wrote sidst redigeret af
                                  #55

                                  @elebertus
                                  Ive read so much LLM code at this point, there are still patterns that are present but elude my understanding, but one thing that's clear is that there are foundational flaw categories that are not improved upon by model version and appear in wildly different projects using wildly different models and harnesses. Testing is a big nexus of those flaws. I am not close to what would be a satisfying explanation of the dynamics, but every project suffers fucked testing problems.

                                  david_chisnall@infosec.exchangeD kris@todon.euK 2 Replies Last reply
                                  0
                                  • jonny@neuromatch.socialJ jonny@neuromatch.social

                                    So, look. One shot rewriting the whole test suite in another language is probably not great to do, but what happened here is so much worse than you are expecting.

                                    https://github.com/RsyncProject/rsync/pull/903/

                                    This does not "translate tests into pytest" or a unit testing framework, it writes its own testing framework where tests are whole python scripts that redefine basic test functions in every script. Surely there would be a single way to "run rsync and get the results" - nope, well, there is, but then every test file will randomly redefine its own _run_and_capture function. So like now rsync needs a test suite for its test suite.

                                    If instead of telling an LLM to "rewrite the tests in python" you just searched "python testing" you would find the pytest docs. And then you would find examples. And then you could write fixtures to deduplicate all the prior shell script setup and teardown stuff, and so on. But since it was just "rewrite the tests in python" its now worse than before, and the odds of the rewrite actually being a 100% faithful translation are close to 0.

                                    eliocamp@mastodon.socialE This user is from outside of this forum
                                    eliocamp@mastodon.socialE This user is from outside of this forum
                                    eliocamp@mastodon.social
                                    wrote sidst redigeret af
                                    #56

                                    @jonny One thing I love about this and other posts linking to slop on GitHub is that more often than not I flat out can't follow the link because GitHub is not working.

                                    1 Reply Last reply
                                    0
                                    • jonny@neuromatch.socialJ jonny@neuromatch.social

                                      Here's an example from some code that was thrust at me this week. The rest of the tests try a bit harder to look like tests, but this one is perplexing.

                                      What does it test? The function name suggests its a smoke test. LLMs love to call things smoke tests. That would suggest this would be an early-run test that fails loudly if some basic precondition - like having ffmpeg - fails. Or, I guess we are smoke testing the ensure_ffmpeg function? Anyway who knows. However we first check if ffmpeg or ffprobe are present, which is exactly what ensure_ffmpeg does. If they aren't present, a warning tells us that ffmpeg/ffprobe are required for the video tests, which makes it seem like this should be a parameterizing test that controls which tests are run, which of course it does not do.

                                      So the test literally does nothing and cannot possibly fail, but says it does at least two things, because to an LLM something saying it does something is the same thing as it actually doing that thing.

                                      bstacey@icosahedron.websiteB This user is from outside of this forum
                                      bstacey@icosahedron.websiteB This user is from outside of this forum
                                      bstacey@icosahedron.website
                                      wrote sidst redigeret af
                                      #57

                                      @jonny I struggle to express how bleak this is.

                                      bstacey@icosahedron.websiteB 1 Reply Last reply
                                      0
                                      • bstacey@icosahedron.websiteB bstacey@icosahedron.website

                                        @jonny I struggle to express how bleak this is.

                                        bstacey@icosahedron.websiteB This user is from outside of this forum
                                        bstacey@icosahedron.websiteB This user is from outside of this forum
                                        bstacey@icosahedron.website
                                        wrote sidst redigeret af
                                        #58

                                        @jonny It's like everyone decided to take a bath in mercury and leaded gasoline.

                                        bipolaron@scholar.socialB europlus@social.europlus.zoneE 2 Replies Last reply
                                        0
                                        • bstacey@icosahedron.websiteB bstacey@icosahedron.website

                                          @jonny It's like everyone decided to take a bath in mercury and leaded gasoline.

                                          bipolaron@scholar.socialB This user is from outside of this forum
                                          bipolaron@scholar.socialB This user is from outside of this forum
                                          bipolaron@scholar.social
                                          wrote sidst redigeret af
                                          #59

                                          @bstacey @jonny with a plugged in datacenter

                                          1 Reply Last reply
                                          0
                                          Svar
                                          • Svar som emne
                                          Login for at svare
                                          • Ældste til nyeste
                                          • Nyeste til ældste
                                          • Most Votes


                                          • Log ind

                                          • Har du ikke en konto? Tilmeld

                                          • Login or register to search.
                                          Powered by NodeBB Contributors
                                          Graciously hosted by data.coop
                                          • First post
                                            Last post
                                          0
                                          • Hjem
                                          • Seneste
                                          • Etiketter
                                          • Populære
                                          • Verden
                                          • Bruger
                                          • Grupper