BIO 202 — Lesson 2: Resampling to ask if new data still belongs

What you'll do

Stage A replays the end of Lesson 1 as a movie. Stages B, C, and D break it. Predict before you play each stage.

Why "does this still belong"? Albino baby alligators get born at the same rate in a Florida swamp and in a captive enclosure — but the wild ones get eaten within days, because a bright-white reptile in a green swamp is a beacon. Sample the "newly hatched" population: identical distributions in both places. Sample the "still alive a week later" population: the swamp sample has shifted, the enclosure one hasn't. Today's question is the question a biologist faces every time the data arrives in pieces: did the population that produced the new measurements change, or did I just get unlucky?

A — Running mean of a stable population

Draws stream in one at a time. Watch the running mean. Watch the interval around it.

Scenario

Adults walk out one at a time. You measure each one's height. Running mean ȳ (blue) and a 95% bootstrap interval (ribbon) update after every draw. The dashed red line is the true μ.

Click "Step the stream" and watch.

Draws + running mean with bootstrap interval

draws: 0 | running ȳ: — | true μ: 168.0 | CI width: —

Prediction (required before the stream starts)

Q1. You watch 10 draws come in, then watch another 90. Which statement is more accurate?
The running mean jitters by the same amount every time a new draw arrives. The running mean moves less and less per new draw — the average gets harder to shift. The running mean stays jittery forever — averages do not stabilize.
Q2. The bootstrap CI on ȳ is your sense of "how wrong could ȳ still be?" After 200 draws it is roughly 1.4 cm wide. After 800 draws (4× more data) it should be roughly:
0.7 cm wide — error shrinks like 1/√n, so 4× the data halves the width 0.35 cm wide — error shrinks like 1/n, so 4× the data quarters the width 1.4 cm wide — interval width is a property of the population, not the sample size

Stream at least 80 draws to unlock Stage B. 0/80 draws

Controls

seed42

draws / click5

R code — running mean + bootstrap CI on a stationary stream

# Stage A: draws stream from N(168, 10). Running mean + bootstrap CI.set.seed(42)mu_pop <- 168sigma  <- 10y <- rnorm(400, mu_pop, sigma)running <- cumsum(y) / seq_along(y)# bootstrap CI on the running mean at each stepci <- t(sapply(seq_along(y), function(k) {  draws <- replicate(200, mean(sample(y[1:k], k, replace = TRUE)))  quantile(draws, c(0.025, 0.975))}))plot(running, type = "l", col = "#2f6b8f", lwd = 2,     xlab = "draw index", ylab = "running mean")abline(h = mu_pop, col = "#b23a48", lwd = 2, lty = 2)

B — The population switches under you, without warning

Same kind of stream. Same running mean. Somewhere in the middle, the building changes.

Scenario

The stream starts the same way Stage A did. At some draw index you set, the door switches. The next adult comes from a different population (NBA players, μ ≈ 199 cm).

Your job: notice when, before clicking Reveal switch.

Running mean + CI, with a hidden switch

draws: 0 | running ȳ: — | μ_old: 168.0 | μ_new: 199.0

switch at draw: — (hidden) | CI covers μ_old? —

Prediction (required before the stream starts)

Q1. After the population switches, the running mean ȳ:
jumps instantly to the new μ on the next draw climbs gradually, weighted between the two populations as more new draws come in ignores the switch — ȳ already converged so it cannot move
Q2. The bootstrap CI on ȳ stops covering the old μ_old = 168 cm. What you can honestly say to a colleague:
"The population has definitely changed." (You proved it.) "If I keep predicting 'a 168-cm population is producing these draws,' that prediction is now incompatible with the data. Whatever's generating the draws is no longer that population." "Sampling noise. CIs always slip off the truth eventually if you wait long enough."

Stream the data and click "Reveal switch" once you spot it (or once 250 draws have passed). 0/1 reveal

Controls

switch at draw80

seed42

R code — stream with a hidden population switch

# Stage B: stream with a hidden switch from N(168, 10) to N(199, 9).set.seed(42)switch_at <- 80   # the analyst is not told thisN <- 300y <- c(rnorm(switch_at, 168, 10),       rnorm(N - switch_at, 199, 9))running <- cumsum(y) / seq_along(y)# at each step, ask: does my bootstrap CI still cover mu_old = 168?covers <- sapply(seq_along(y), function(k) {  bs <- replicate(200, mean(sample(y[1:k], k, replace = TRUE)))  q  <- quantile(bs, c(0.025, 0.975))  168 >= q[1] & 168 <= q[2]})which(!covers)[1]   # first draw at which CI excludes mu_old

C — How fast does the alarm fire?

Same setup as Stage B, run 100 times. Sliders for shift size and CI width. Predict before you run.

Scenario

100 replicates. Each runs 300 draws: first 100 from N(168, 10), then a switch to N(168 + Δ, 10). Δ and the CI level are sliders. We record the first draw at which the CI excludes μ_old.

Gray bars: true alarms (after the switch). Orange bars: false alarms (CI broke before the switch).

Histogram of "first alarm draw index" across 100 replicates

median alarm: — | false-alarm rate: — | missed (no alarm by draw 300): —

Prediction (required before sliders unlock)

Q1. You raise the shift magnitude Δ from 5 cm to 15 cm. The median first-alarm draw will:
get earlier (smaller index) — a bigger shift is easier to spot get later — a bigger shift takes more data to notice stay the same — shift size does not affect detection speed
Q2. You widen the CI from 95% to 99%. The false-alarm rate (fires before the real switch) will:
rise — a wider CI is more sensitive fall — a wider CI is harder to break stay the same — CI width does not affect false alarms

Run 100-replicate batches across at least 3 (Δ, CI) combinations to unlock Stage D. 0/3 runs

Controls

shift Δ (cm)10.0

CI level (%)95

seed42

R code — replicate the detection experiment

set.seed(42)delta <- 10.0ci_level <- 95 / 100alpha <- 1 - ci_levelfirst_alarm <- replicate(100, {  y <- c(rnorm(100, 168, 10), rnorm(200, 168 + delta, 10))  covers <- sapply(10:length(y), function(k) {    bs <- replicate(100, mean(sample(y[1:k], k, replace = TRUE)))    q <- quantile(bs, c(alpha/2, 1 - alpha/2))    168 >= q[1] & 168 <= q[2]  })  alarm_idx <- which(!covers)[1] + 9  if (is.na(alarm_idx)) NA else alarm_idx})hist(first_alarm, breaks = 30, col = "gray70")

D — A real shifted population: NHANES vs NBA

Two real datasets. Draw one adult from each. Look at where the bars overlap.

Scenario

Gray: 7,414 NHANES adults. Blue: 4,768 NBA careers (males). One random individual from each, on every draw.

NHANES (gray) and NBA (blue) — heights, with random draws on top

NHANES μ: — | NBA μ: — | Δ μ: —

% NHANES taller than the NBA median: — | draws so far: 0

Prediction (required before the draw button unlocks)

Q1. The mean NBA height is about 30 cm taller than the NHANES mean. What fraction of NHANES adults do you expect to exceed the median NBA player?
essentially 0% — the means are so far apart that almost no overlap exists a small but nonzero fraction (under 2%) a substantial fraction (10% or more)
Q2. You draw 10 random NBA players and 10 random NHANES adults. Among these 20 individuals, the tallest one is most likely:
always from the NBA group usually from the NBA group, but occasionally the tallest NHANES adult beats the tallest NBA player in the sample equally likely from either — populations overlap so much that draws ignore the group

Make at least 20 paired draws to wrap up. 0/20 paired draws

Controls

seed42

R code — two real populations side by side

nh  <- read.csv("data/clean/nhanes_adults.csv")nba <- read.csv("data/clean/nba_players.csv")nba_h_cm <- nba$height_in * 2.54mean(nh$Height); sd(nh$Height)mean(nba_h_cm); sd(nba_h_cm)mean(nh$Height > median(nba_h_cm))set.seed(42)hist(nh$Height, breaks = 40, freq = FALSE,     col = rgb(.5, .5, .5, .4), border = NA,     xlim = c(140, 220), xlab = "height (cm)", main = "")hist(nba_h_cm, breaks = 40, freq = FALSE,     col = rgb(.18, .42, .56, .5), border = NA, add = TRUE)

Stretch challenge (optional)

NBA heights come in feet-inches, NHANES heights in cm. Convert the NBA roster yourself (1 inch = 2.54 cm) and reproduce the overlap fraction on real data. Then ask: are the two σ's the same? If not, which population is wider, and why might that be?

Not yet attempted.

Showcase — when two populations stop being one

Same machine you just ran on NHANES vs. NBA. Different islands.

What you're looking at

A few thousand years ago, a small number of tortoises washed up on Galápagos and founded every population on every island. One source pool, one species.

Today their shells look different on each island. Tortoises on dry, sparse islands like Pinta carry saddleback shells with a tall front opening — necks have to reach up to graze cactus pads. Tortoises on lush volcanic islands like Isabela carry domed shells with a low front opening — they graze grasses near the ground. x here is the height of the front shell opening in cm.

The question is the same one you asked of NHANES vs. NBA: could these two samples have come from a single underlying population?

Top: each island's distribution of front-opening heights (n=20 each). Bottom: bootstrap of the difference of means.

Pinta mean: — cm | Isabela mean: — cm | observed Δ: — cm

bootstrap 95% CI on Δ: — | CI includes 0? —

What the bootstrap is doing

Both islands trace back to one ancestor population. What kept the shapes from re-merging? Vertical transmission within each island — Pinta parents had Pinta-shape babies on Pinta, generation after generation. And only weak transmission between islands — tortoises don't swim across kilometres of ocean. So the two channels of inheritance ran in parallel for long enough to drift apart and stay apart.

The bootstrap distribution below is the diagnostic for that breakdown: if the 95% CI of the difference does not include zero, the two samples are no longer consistent with a single common pool.

By the end of the course you will ask this same question — did transmission between two things stop? — about cells in a body, workers in a colony, lineages in a clade. Same engine. Different scale. (Unit 5 names the algebra: it is the Price equation.)

R code — bootstrap the difference of means

t <- read.csv("data/clean/galapagos_tortoises.csv")p <- subset(t, island == "pinta")$front_opening_cmi <- subset(t, island == "isabela")$front_opening_cmdelta_obs <- mean(p) - mean(i)# bootstrap: resample each island with replacement, recompute the differencedeltas <- replicate(2000, {  pb <- sample(p, length(p), replace = TRUE)  ib <- sample(i, length(i), replace = TRUE)  mean(pb) - mean(ib)})quantile(deltas, c(0.025, 0.975))   # 95% CI on the difference