BIO 202, Spring 2026, draft v1. PGLS. Species aren't independent observations — they share ancestors. Subtract the shared part before you ask what's left.
Simulate trait evolution on a tree. Compare naive OLS to phylogenetic independent contrasts. Fit Pagel's λ. End at AVONET birds.
Simulate two traits evolving by Brownian motion on a tree. Run OLS on the tip values. The slope you get can be misleading — because the species aren't independent.
Generate a tree of N species. Simulate two traits x and y, each by Brownian motion. The traits evolve independently — the "true" slope is 0. But the naive OLS regression of y on x will often show a non-zero slope, because species in the same clade have correlated trait values.
library(ape)set.seed(42); N <- 30tr <- rcoal(N) # random coalescent treex <- rTraitCont(tr); y <- rTraitCont(tr) # independent BMsummary(lm(y ~ x)) # naive slope
Compute contrasts at each internal node. Contrasts are independent under Brownian motion. Regress contrasts of y on contrasts of x — the slope is unbiased.
For each internal node of the tree, compute the contrast: (x_left − x_right) / sqrt(t_left + t_right). Do the same for y. Then regress y-contrasts on x-contrasts through the origin. For independent BM traits, this slope is centered on 0.
library(ape)px <- pic(x, tr); py <- pic(y, tr)summary(lm(py ~ 0 + px)) # PIC slope through origin
λ = 0 means no phylogenetic signal (species are effectively independent). λ = 1 means full Brownian. Fit λ to your traits.
Pagel's λ scales the off-diagonal elements of the phylogenetic covariance matrix. At λ=0, off-diagonals are 0 (no shared ancestry effect). At λ=1, full BM. The maximum-likelihood λ tells you how much your trait is phylogenetically conserved.
library(phytools)phylosig(tr, x, method = "lambda", test = TRUE)
Two species' traits look correlated. Three causal stories could produce that. Build each one in the DAG below and see what the trait-vs-trait scatter looks like.
The DAG widget treats "clade" as a categorical confounder — the same role phylogeny plays in Pagel's λ. Build each story, color by clade, watch the within-clade slopes vs the pooled slope.
Pagel's λ measures how much of the across-species covariance is shared-ancestry covariance. λ = 1 means the entire signal is the "phylogeny → both" story. λ = 0 means the entire signal is the within-lineage story. PGLS conditions on the tree, which is the same move as conditioning on "clade" in the DAG.
From data/clean/avonet_birds.csv. Regress migration distance on hand-wing index, with and without phylogenetic correction.
Hand-wing index (a wing-shape ratio) is known to correlate with dispersal ability. Naive cross-species correlation may overstate the relationship because closely related species share both. Compare naive OLS vs phylogenetic correction.
library(ape); library(nlme)birds <- read.csv("data/clean/avonet_birds.csv")# Match to BirdTree, then PGLSm <- gls(migration ~ hand_wing_index, data = birds, correlation = corBrownian(phy = bird_tree))summary(m)
The lecture's earthworm vignette, drawn as data. Pooled across all species, body size and soil pH show a clear negative relationship. Within each clade — the sister-species pairs — the relationship is positive. The phylogeny is doing all of the work in the pooled slope.
Synthetic earthworm data faithful to the structure in the lecture quote: 30 species in 6 clades of 5 species each. Within each clade, larger worms prefer slightly more alkaline soils (positive slope). Across clades, the bigger-bodied clades happen to occupy more acidic soils (so pooled across clades, the slope flips negative). The categorical variable is clade membership — and once it's drawn into the model, the within-clade slope is what's actually happening biologically.
worms <- read.csv("data/clean/earthworm_clades.csv")pooled <- lm(soil_pH ~ body_size, data = worms)within <- lm(soil_pH ~ body_size + clade, data = worms)summary(pooled)$coefficients["body_size", ]summary(within)$coefficients["body_size", ]