Lesson 7 — Tracing how much of a parent ends up in their child

BIO 202, Spring 2026, draft v2. The same regression machinery you have been using since Lesson 3, applied to parents and their children. By Stage C, the slope has a familiar identity.

What you'll do

Four stages. Build a parent-offspring simulator, then run the regression on 934 real children from Galton's 1885 paper.

A — Two independent traits, no relationship

Simulate a population where two traits are generated independently. Read three summary numbers; reseed and read them again.

Locked — confirm your name above to begin.

Scenario

Two simulated traits, drawn independently from Normal(0, 1). n individuals. Read the three summary numbers (covariance, Pearson r, Spearman ρ) and watch what they do as you change the seed.

Scatter of two independent traits

n: 200  |  cov(x, y):  |  Pearson r:  |  Spearman ρ:

Prediction (required before sliders unlock)

  1. Q1. You simulate two traits that are truly independent (no real connection) and measure 200 individuals. What do you predict the sample Pearson r will be?
  2. Q2. You keep simulating the same two independent traits but increase n from 200 → 2,000 → 20,000. What happens to the typical sample r?
Try at least 5 seed/n combinations to unlock Stage B. 0/5 combos

Controls

200
42

R code — independent traits, three summaries

set.seed(42)n <- 200x <- rnorm(n); y <- rnorm(n)cov(x, y)cor(x, y)                # Pearsoncor(x, y, method = "spearman")   # Spearman rank correlation

B — Couple the traits: introduce heritability h²

Now the offspring trait depends on the parents' trait. h² is a slider. Move it.

Complete Stage A (submit prediction, try 5 combos) to unlock this section.

Scenario

Simulated parent–offspring pairs. Midparent m = (father + mother) / 2. Offspring trait generated as:

y = μ + h²·(m − μ) + Normal(0, σ·√(1 − h²))

Move h². Watch the cloud.

Midparent vs. offspring, with cloud tilt

true h²: 0.50  |  cov:  |  Pearson r:

Prediction (required before sliders unlock)

  1. Q1. As h² moves from 0 to 1, the covariance between midparent and offspring will:
  2. Q2. At h² = 1, the cloud of (midparent, offspring) points looks like:
Try at least 6 h² values to unlock Stage C. 0/6 h² values

Controls

0.50
300
42

R code — couple two traits via h²

set.seed(42)n <- 300h2 <- 0.50mu <- 68; sigma <- 3father <- rnorm(n, mu, sigma); mother <- rnorm(n, mu, sigma)midparent <- (father + mother) / 2offspring <- mu + h2 * (midparent - mu) +             rnorm(n, 0, sigma * sqrt(1 - h2))cov(midparent, offspring); cor(midparent, offspring)

C — Fit offspring on midparent

Same simulator. Fit lm(offspring ~ midparent). Compare β̂ to the true h² slider.

Complete Stage B (submit prediction, try 6 h² values) to unlock this section.

Scenario

Same simulator as Stage B. Now we explicitly fit lm(offspring ~ midparent) and display the slope β̂ next to the true h² slider.

Drag h². Watch β̂.

Midparent × offspring with fitted line

true h²: 0.50  |  fitted slope β̂:  |  |β̂ − h²|:
R²:  |  SE(β̂):

Prediction (required before slider unlocks)

  1. Q1. β̂ is the slope of the regression of offspring on midparent. Pick the description that captures what β̂ does for you:
  2. Q2. Galton observed slope ≈ 0.65 for child-on-midparent height. Biological reading?
  3. Q3. (Reflection — not scored.) Suppose β̂ came out as 1.0 in some other population instead of 0.65. In one sentence, what would that say about the heritability of height in that population? Write your answer below.
Drag h² through at least 8 distinct values to unlock Stage D. 0/8 values

Controls

0.50
500
42

R code — read h² off the regression slope

set.seed(42)n <- 500h2 <- 0.50mu <- 68; sigma <- 3f <- rnorm(n, mu, sigma); m <- rnorm(n, mu, sigma)midparent <- (f + m) / 2offspring <- mu + h2 * (midparent - mu) + rnorm(n, 0, sigma * sqrt(1 - h2))fit <- lm(offspring ~ midparent)coef(fit)[2]               # the slope IS h-hat-squaredsummary(fit)$coefficients[2, 2]   # SE on h²-hat

D — Galton's 1885 data: 934 children, 197 families

Real father–mother–child records. Same regression you just built. Read the slope, then bootstrap it.

Complete Stage C (submit prediction, try 8 h² values) to unlock this section.

Scenario

934 child-records, 197 Victorian-era families. Midparent height = (father + 1.08·mother) / 2 (Galton's sex correction).

Fit lm(childHeight ~ midparentHeight). Bootstrap it. The slope is what you came here for.

Galton family data: midparent height vs child height

N children:  |  families:  |  slope (= ĥ²):
intercept:  |  R²:  |  95% bootstrap on ĥ²:

Prediction (required before bootstrap unlocks)

  1. Q1. The slope of childHeight on midparentHeight in Galton's data will be:
  2. Q2. Split the data by child sex. The two sex-specific slopes will:
Run the bootstrap and split-by-sex toggle to wrap up. 0/2 actions

Controls

200
42

R code — Galton in 5 lines

g <- read.csv("data/clean/galton_families.csv")g$midparent <- (g$father + 1.08 * g$mother) / 2fit <- lm(childHeight ~ midparent, data = g)coef(fit)[2]   # h-hat-squared ≈ 0.65B <- 200replicate(B, {  k <- sample(nrow(g), nrow(g), replace = TRUE)  coef(lm(childHeight ~ midparent, data = g[k, ]))[2]}) |> quantile(c(.025, .975))

Stretch challenge (optional)

Galton's mother coefficient of 1.08 is rough. Refit using an un-corrected mother coefficient of 1.0 and report how the slope changes. Then run a per-sex fit: male offspring on midparent (no correction), then female offspring on midparent. Which approach gives the cleanest ĥ²? Hit "I tried it" after you have an answer.

Not yet attempted.