BIO 202, Spring 2026, draft v1. In Lesson 3 you fit a line. Here you ask what counts as "fit" in the first place.
Move a line by hand. Save 20 that score the same R² (to 2 decimal places) as the OLS optimum. Then ask whether a bootstrap reaches the same cloud.
Synthetic scatter with a true generative line. Move the sliders. R² to 2 decimal places sits in the corner.
50 points drawn from y = 1.0·x + 0.5 + Normal(0, 1). You place the line (it does not auto-fit). The big number is R² rounded to 2 decimal places.
Aim for the highest R² you can. Then see how far you can wiggle α and β before the rounded number changes.
set.seed(42)n <- 50x <- runif(n, 0, 5)y <- 0.5 + 1.0*x + rnorm(n, 0, 1)my_a <- 0.5my_b <- 1.0yhat <- my_a + my_b * xr2_my <- 1 - sum((y - yhat)^2) / sum((y - mean(y))^2)fit <- lm(y ~ x)round(summary(fit)$r.squared, 2)round(r2_my, 2)
Save every line you find that matches the OLS R² within tolerance. Target: 20 saves.
Same scatter. When your R² (to 2 dp) matches the OLS R² within the tolerance, click Save this line. Saved lines stack as faint blue. Target: 20 saves.
Watch where the saved lines fall on the α–β plane (Stage C will plot it).
# Same data as Stage A.target_r2 <- round(summary(lm(y ~ x))$r.squared, 2)# grid-search (a, b) pairs whose rounded R^2 matchesgrid <- expand.grid(a = seq(-1, 2, by = 0.05), b = seq(0.6, 1.4, by = 0.05))grid$r2 <- apply(grid, 1, function(p) { 1 - sum((y - (p[1] + p[2]*x))^2) / sum((y - mean(y))^2)})keep <- subset(grid, round(r2, 2) == target_r2)nrow(keep) # how many pairs share the rounded R^2?
200 bootstrap resamples. Refit OLS on each. Overlay the 200 fitted lines, then look at the α–β plane next to your Stage B cloud.
A bootstrap resample = 50 points sampled with replacement from the original 50. Some points appear twice, some not at all. Refit OLS on each resample; collect (α̂, β̂).
200 of them, plotted as faint red lines on the scatter. Your Stage B saved cloud appears green in the α–β plot. Compare.
# Same x, y as Stage A.set.seed(42)reps <- 200boot <- replicate(reps, { i <- sample(length(x), length(x), replace = TRUE) coef(lm(y[i] ~ x[i]))})apply(boot, 1, quantile, c(.025, .975))cor(boot[1,], boot[2,]) # negative -- alpha and beta trade off
Same two moves as Stages B and C, on a 400-person NHANES subsample. Compare R² at β̂, at β̂ + 1 SE, and at β̂ − 1 SE.
A 400-person random subsample from NHANES. Fit weight ~ height, bootstrap it 200 times. R² readouts at β̂, β̂ + 1 SE, and β̂ − 1 SE all appear in the panel. Compare them.
nh <- read.csv("data/clean/nhanes_adults.csv")set.seed(42)idx <- sample(nrow(nh), 400)d <- nh[idx, ]fit <- lm(Weight ~ Height, data = d)B <- 200boot <- replicate(B, { k <- sample(nrow(d), nrow(d), replace = TRUE) coef(lm(Weight ~ Height, data = d[k, ]))})apply(boot, 1, quantile, c(.025, .975))
Run the same bootstrap at n = 50, n = 200, and n = 1500. Report the bootstrap SE on β̂ for each. Then find a function of n that the three SEs fit, and say which way the cloud widens. Hit "I tried it" once you have a function.
Same cloud you just built. Different question: how old does the bottom of the cliff have to be?
The White Cliffs of Dover are ~100 m of chalk — coccolithophore shells stacked ~250 per millimeter, deposited in order with nothing strong enough to shuffle them since. Above: modern marine sites where the same carbonate sediment is forming today, with the rate measured at each.
The bootstrap below resamples those rates and asks: at this rate, how long would it take to deposit 100 m of chalk?
Every resample is a possible "true" mean deposition rate for the Cretaceous Dover sea, and every one implies an age in the tens of millions of years. The pre-Darwin chronology of ~6,000 years appears nowhere in the cloud. The bootstrap forced the bottom of the cliff to be old.
d <- read.csv("data/clean/coccolith_deposition.csv")rates <- d$rate_cm_per_kyrcliff_cm <- 10000 # 100 m of chalkage_yr <- replicate(5000, { r <- mean(sample(rates, length(rates), replace = TRUE)) cliff_cm / r * 1000 # cm / (cm/kyr) * (yr/kyr) = yr})quantile(age_yr, c(0.025, 0.975)) # 95% CI on agemean(age_yr < 6000) # bootstrap mass under creationist chronology: 0