Running tally — naïve slope vs. order-mean-centered slope
What you just did has a name
Species are not independent data points. A regression that treats them as independent inflates degrees of freedom and can load most of its apparent signal onto a single deep split in the tree. The order-mean-centering trick you just used is a lightweight version of phylogenetic comparative methods (PCMs) like phylogenetic independent contrasts (PIC) and PGLS. Real PCMs use the full tree, not just order membership; they downweight closely related species and retain power from distantly related ones.
Round 1 (body mass vs. wing length) had nearly identical naïve and centered slopes — wing allometry is preserved within every clade, so removing clade means barely changes the fit. Round 2 (mass vs. hand-wing index) and round 3 (beak length vs. depth) had clade means doing some of the work: the centered slopes were shallower. Round 4 (mass vs. range size) saw the slope halve — most of the "big species have big ranges" effect is clade-level, not within-clade. Round 5 (toy Simpson's paradox) flipped sign entirely when the naïve pooled analysis aligned the clade means negatively while the within-clade relationship was positive.
PCMs do not discard data. They weight the comparisons correctly. The tree is part of the data; ignoring it inflates effective sample size and can produce confident-looking slopes with the wrong sign.