One thing that the Rust rewrite of coreutils tried to do was to prove that it was making steady progress by the number of test cases originating from GNU coreutils that it could pass.
I very much suspect that there's a whole host of race condition tests that made it into the test corpus late in the game.
Test-driven rewrite has its limits.
Note the uptick in failures at the very right edge of the graph, they are currently under 90% tests successful.