BIO 202, Spring 2026, draft v2. The same regression machinery you have been using since Lesson 3, applied to parents and their children. By Stage C, the slope has a familiar identity.
Four stages. Build a parent-offspring simulator, then run the regression on 934 real children from Galton's 1885 paper.
Simulate a population where two traits are generated independently. Read three summary numbers; reseed and read them again.
Two simulated traits, drawn independently from Normal(0, 1). n individuals. Read the three summary numbers (covariance, Pearson r, Spearman ρ) and watch what they do as you change the seed.
set.seed(42)n <- 200x <- rnorm(n); y <- rnorm(n)cov(x, y)cor(x, y) # Pearsoncor(x, y, method = "spearman") # Spearman rank correlation
Now the offspring trait depends on the parents' trait. h² is a slider. Move it.
Simulated parent–offspring pairs. Midparent m = (father + mother) / 2. Offspring trait generated as:
y = μ + h²·(m − μ) + Normal(0, σ·√(1 − h²))
Move h². Watch the cloud.
set.seed(42)n <- 300h2 <- 0.50mu <- 68; sigma <- 3father <- rnorm(n, mu, sigma); mother <- rnorm(n, mu, sigma)midparent <- (father + mother) / 2offspring <- mu + h2 * (midparent - mu) + rnorm(n, 0, sigma * sqrt(1 - h2))cov(midparent, offspring); cor(midparent, offspring)
Same simulator. Fit lm(offspring ~ midparent). Compare β̂ to the true h² slider.
Same simulator as Stage B. Now we explicitly fit lm(offspring ~ midparent) and display the slope β̂ next to the true h² slider.
Drag h². Watch β̂.
The predictor is the midparent average of both parents, so this slope reads h² directly. A regression on one parent alone would give ½h² (you'd double it to recover h²) — keep track of which slope you're reading.
set.seed(42)n <- 500h2 <- 0.50mu <- 68; sigma <- 3f <- rnorm(n, mu, sigma); m <- rnorm(n, mu, sigma)midparent <- (f + m) / 2offspring <- mu + h2 * (midparent - mu) + rnorm(n, 0, sigma * sqrt(1 - h2))fit <- lm(offspring ~ midparent)coef(fit)[2] # the slope IS h-hat-squaredsummary(fit)$coefficients[2, 2] # SE on h²-hat
Real father–mother–child records. Same regression you just built. Read the slope, then bootstrap it.
934 child-records, 197 Victorian-era families. Midparent height = (father + 1.08·mother) / 2 (Galton's sex correction).
Fit lm(childHeight ~ midparentHeight). Bootstrap it. The slope is what you came here for.
g <- read.csv("data/clean/galton_families.csv")g$midparent <- (g$father + 1.08 * g$mother) / 2fit <- lm(childHeight ~ midparent, data = g)coef(fit)[2] # the slope IS h-hat-squaredB <- 200replicate(B, { k <- sample(nrow(g), nrow(g), replace = TRUE) coef(lm(childHeight ~ midparent, data = g[k, ]))[2]}) |> quantile(c(.025, .975))
Galton's mother coefficient of 1.08 is rough. Refit using an un-corrected mother coefficient of 1.0 and report how the slope changes. Then run a per-sex fit: male offspring on midparent (no correction), then female offspring on midparent. Which approach gives the cleanest ĥ²? Hit "I tried it" after you have an answer.