BIO 202, Spring 2026, draft v1. In Lesson 1 you knew nothing about each individual. Here you know one extra thing per person.
Predict adult weight four times. Each round you get a different piece of information to work with. Don't read ahead.
Same setup as Lesson 1 Stage B, with weight instead of height. Move the slider, watch the errors.
NHANES adult weights. The slider is your constant guess. Gray = the truth, red = your signed errors. Move the slider and watch the two distributions move relative to each other.
nh <- read.csv("data/clean/nhanes_adults.csv")y <- nh$Weightmean(y); median(y); sd(y)guess <- 81.0err <- guess - ymean(err^2) # MSE -- minimized by guess = meanmean(abs(err)) # MAE -- minimized by guess = median
Same dataset. One new piece of information per person.
Each NHANES adult has a height and a weight. Blue line: weight ~ α + β·height. Dashed gray: the Stage A constant predictor.
yi ~ Normal(α + β·xi, σ)
Toggle "show residuals" to see the vertical errors. Watch σ as you do.
lm(weight ~ height) reads as:nh <- read.csv("data/clean/nhanes_adults.csv")set.seed(42)idx <- sample(nrow(nh), 400)d <- nh[idx, ]fit <- lm(Weight ~ Height, data = d)summary(fit)sd(residuals(fit)) # residual SD -- smaller than sd(d$Weight)plot(d$Height, d$Weight, pch = 16, col = "#444", xlab = "height (cm)", ylab = "weight (kg)")abline(fit, col = "#2f6b8f", lwd = 2)abline(h = mean(d$Weight), col = "gray60", lty = 2)
Five scatters. Each has a straight-line fit. Some are wrong on purpose. Pick the residual pattern. Move on.
Each round: a scatter with a fitted line (top) and a residuals plot (bottom). Pick the pattern. Five rounds, real data each time.
# Round 1: mammal mass vs gestation, log-log (clean)m <- read.csv("data/clean/pantheria_mammals.csv")fit <- lm(log(AdultBodyMass_g) ~ log(GestationLen_d), data = m)plot(fitted(fit), residuals(fit), pch = 16)abline(h = 0, col = "#b23a48")# Same plot, linear-linear: curvature appearsplot(lm(AdultBodyMass_g ~ GestationLen_d, data = m))
Real growth data for both kids. One measurement is hidden per round. Click where you think it goes; the reveal tells you what was actually there.
One kid's mass-by-age trajectory with a single point hidden. WHO median in gray. Click where you think the missing point belongs.
The reveal color tells you something about that point. Six rounds.
k <- read.csv("data/clean/kids_growth.csv")b <- subset(k, kid == "beren" & measure == "mass")fit <- lm(value ~ poly(age_years, 2), data = b)b$residual <- residuals(fit)# compare residuals by sick_proxy flagaggregate(residual ~ sick_proxy, data = b, mean)boxplot(residual ~ sick_proxy, data = b, col = c("#6f8a4a", "#a86a1a"))
The downloaded .R for Stage D shows the boxplot move: pull residuals from the smooth "mass = f(age)" fit on Beren, then split them by sick_proxy. Do it. Are the sick-day residuals systematically more negative? Refit including sick_proxy as a predictor and report the new residual SD.