BIO 202, Spring 2026, draft v1. Measurements arriving one at a time. You will not always be told when (or whether) the population doing the generating has changed.
Stage A replays the end of Lesson 1 as a movie. Stages B, C, and D break it. Predict before you play each stage.
Draws stream in one at a time. Watch the running mean. Watch the interval around it.
Adults walk out one at a time. You measure each one's height. Running mean ȳ (blue) and a 95% bootstrap interval (ribbon) update after every draw. The dashed red line is the true μ.
Click "Step the stream" and watch.
# Stage A: draws stream from N(168, 10). Running mean + bootstrap CI.set.seed(42)mu_pop <- 168sigma <- 10y <- rnorm(400, mu_pop, sigma)running <- cumsum(y) / seq_along(y)# bootstrap CI on the running mean at each stepci <- t(sapply(seq_along(y), function(k) { draws <- replicate(200, mean(sample(y[1:k], k, replace = TRUE))) quantile(draws, c(0.025, 0.975))}))plot(running, type = "l", col = "#2f6b8f", lwd = 2, xlab = "draw index", ylab = "running mean")abline(h = mu_pop, col = "#b23a48", lwd = 2, lty = 2)
Same kind of stream. Same running mean. Somewhere in the middle, the building changes.
The stream starts the same way Stage A did. At some draw index you set, the door switches. The next adult comes from a different population (NBA players, μ ≈ 199 cm).
Your job: notice when, before clicking Reveal switch.
# Stage B: stream with a hidden switch from N(168, 10) to N(199, 9).set.seed(42)switch_at <- 80 # the analyst is not told thisN <- 300y <- c(rnorm(switch_at, 168, 10), rnorm(N - switch_at, 199, 9))running <- cumsum(y) / seq_along(y)# at each step, ask: does my bootstrap CI still cover mu_old = 168?covers <- sapply(seq_along(y), function(k) { bs <- replicate(200, mean(sample(y[1:k], k, replace = TRUE))) q <- quantile(bs, c(0.025, 0.975)) 168 >= q[1] & 168 <= q[2]})which(!covers)[1] # first draw at which CI excludes mu_old
Same setup as Stage B, run 100 times. Sliders for shift size and CI width. Predict before you run.
100 replicates. Each runs 300 draws: first 100 from N(168, 10), then a switch to N(168 + Δ, 10). Δ and the CI level are sliders. We record the first draw at which the CI excludes μ_old.
Gray bars: true alarms (after the switch). Orange bars: false alarms (CI broke before the switch).
set.seed(42)delta <- 10.0ci_level <- 95 / 100alpha <- 1 - ci_levelfirst_alarm <- replicate(100, { y <- c(rnorm(100, 168, 10), rnorm(200, 168 + delta, 10)) covers <- sapply(10:length(y), function(k) { bs <- replicate(100, mean(sample(y[1:k], k, replace = TRUE))) q <- quantile(bs, c(alpha/2, 1 - alpha/2)) 168 >= q[1] & 168 <= q[2] }) alarm_idx <- which(!covers)[1] + 9 if (is.na(alarm_idx)) NA else alarm_idx})hist(first_alarm, breaks = 30, col = "gray70")
Two real datasets. Draw one adult from each. Look at where the bars overlap.
Gray: 7,414 NHANES adults. Blue: 4,768 NBA careers (males). One random individual from each, on every draw.
nh <- read.csv("data/clean/nhanes_adults.csv")nba <- read.csv("data/clean/nba_players.csv")nba_h_cm <- nba$height_in * 2.54mean(nh$Height); sd(nh$Height)mean(nba_h_cm); sd(nba_h_cm)mean(nh$Height > median(nba_h_cm))set.seed(42)hist(nh$Height, breaks = 40, freq = FALSE, col = rgb(.5, .5, .5, .4), border = NA, xlim = c(140, 220), xlab = "height (cm)", main = "")hist(nba_h_cm, breaks = 40, freq = FALSE, col = rgb(.18, .42, .56, .5), border = NA, add = TRUE)
NBA heights come in feet-inches, NHANES heights in cm. Convert the NBA roster yourself (1 inch = 2.54 cm) and reproduce the overlap fraction on real data. Then ask: are the two σ's the same? If not, which population is wider, and why might that be?
Same machine you just ran on NHANES vs. NBA. Different islands.
A few thousand years ago, a small number of tortoises washed up on Galápagos and founded every population on every island. One source pool, one species.
Today their shells look different on each island. Tortoises on dry, sparse islands like Pinta carry saddleback shells with a tall front opening — necks have to reach up to graze cactus pads. Tortoises on lush volcanic islands like Isabela carry domed shells with a low front opening — they graze grasses near the ground. x here is the height of the front shell opening in cm.
The question is the same one you asked of NHANES vs. NBA: could these two samples have come from a single underlying population?
Both islands trace back to one ancestor population. What kept the shapes from re-merging? Vertical transmission within each island — Pinta parents had Pinta-shape babies on Pinta, generation after generation. And only weak transmission between islands — tortoises don't swim across kilometres of ocean. So the two channels of inheritance ran in parallel for long enough to drift apart and stay apart.
The bootstrap distribution below is the diagnostic for that breakdown: if the 95% CI of the difference does not include zero, the two samples are no longer consistent with a single common pool.
By the end of the course you will ask this same question — did transmission between two things stop? — about cells in a body, workers in a colony, lineages in a clade. Same engine. Different scale. (Unit 5 names the algebra: it is the Price equation.)
t <- read.csv("data/clean/galapagos_tortoises.csv")p <- subset(t, island == "pinta")$front_opening_cmi <- subset(t, island == "isabela")$front_opening_cmdelta_obs <- mean(p) - mean(i)# bootstrap: resample each island with replacement, recompute the differencedeltas <- replicate(2000, { pb <- sample(p, length(p), replace = TRUE) ib <- sample(i, length(i), replace = TRUE) mean(pb) - mean(ib)})quantile(deltas, c(0.025, 0.975)) # 95% CI on the difference