← all lessons

Scaffold S15 — Phylogenetic non-independence

Five rounds. Each shows the same 240 bird species (Avonet, 30 species from each of 8 orders) plotted as two log-transformed traits. You predict the naïve cross-species regression slope, then the slope after order-mean centering (a lightweight proxy for phylogenetic correction). The two slopes can agree, shrink toward zero, or flip sign — depending on how much of the correlation is clade-level versus within-clade.

Locked — answer the pretest above first.

Running tally — naïve slope vs. order-mean-centered slope

What you just did has a name

Species are not independent data points. A regression that treats them as independent inflates degrees of freedom and can load most of its apparent signal onto a single deep split in the tree. The order-mean-centering trick you just used is a lightweight version of phylogenetic comparative methods (PCMs) like phylogenetic independent contrasts (PIC) and PGLS. Real PCMs use the full tree, not just order membership; they downweight closely related species and retain power from distantly related ones.

Round 1 (body mass vs. wing length) had nearly identical naïve and centered slopes — wing allometry is preserved within every clade, so removing clade means barely changes the fit. Round 2 (mass vs. hand-wing index) and round 3 (beak length vs. depth) had clade means doing some of the work: the centered slopes were shallower. Round 4 (mass vs. range size) saw the slope halve — most of the "big species have big ranges" effect is clade-level, not within-clade. Round 5 (toy Simpson's paradox) flipped sign entirely when the naïve pooled analysis aligned the clade means negatively while the within-clade relationship was positive.

PCMs do not discard data. They weight the comparisons correctly. The tree is part of the data; ignoring it inflates effective sample size and can produce confident-looking slopes with the wrong sign.