Running tally — null envelope vs. observed slope (signed z-score)
What you just did has a name
Five times, you predicted the width of the null distribution of slopes
under the permutation hypothesis: "the y-values are exchangeable, the
x-values are not." Your guesses should have gotten better as you went
— the envelope width depends on three things you can see directly from
the scatter: the residual scatter σ, the range of x, and the
number of observations. The formal relationship is
SE(β̂) ≈ σ / (SD(x) · √(n−1)); the 95% half-width is
roughly 2 · SE(β̂).
What you just did is a permutation test. It is the most defensible null model you can build: it makes no distributional assumptions, and it directly answers "would the pattern be surprising under random relabeling?" If the observed slope is far outside the permutation envelope, something non-random is going on. If it is inside, whatever trend is there might just be sampling noise.
Notice that rounds 3 (LTEE early) and 4 (LTEE plateau) use the same simulator and the same population, but a different window in time. The early window has a slope far outside the null envelope; the plateau window has a slope just barely outside — that is the famous "diminishing-returns" debate as a null-model result.