BIO 202, Spring 2026 — draft v3. Each stage hands you one more piece of information and asks the same question again. Each stage unlocks after you predict and explore the previous one.
One question, five times: what is your best guess for y, and how wrong will you typically be? Each stage gives you one more piece of information about the frog and asks whether the guess gets any better.
The formulas come along for the ride. The skill is translating a formula back into the prediction question out loud.
Someone hands you a new frog. You have no other information. The best single-number guess is the population mean μ, and σ is how wrong that guess will typically be.
For this stage: say why μ is the best guess under no-information, why σ is the typical size of the error, and why the sample ȳ wiggles around the true μ run to run.
Gray tree frogs, one pond, no other information. Someone hands you a frog and asks for its body mass. The best number to say is the population mean μ. It minimizes your average squared error against any other value you could pick.
yi ~ Normal(μ, σ)
You do not know μ. You have a sample of n frogs, and from them you compute ȳ. ȳ is your estimate of μ. A different n frogs would give a slightly different ȳ. The true μ is a property of the world; the estimate wiggles run to run.
σ is the other half of the story: how wrong this guess will typically be. A typical frog sits about σ away from μ. Small σ means the mean is a sharp prediction. Large σ means the mean is the best you can do, and the best is not very good.
Two parameters, one for the guess and one for the size of the error. Stage B adds one new piece of information about the frog and asks whether the guess gets any better.
Note that Stage D writes y ~ Normal(β·x + α, σ). Set β = 0 and you are back here. Regression does not replace the mean; it extends it.
A colleague samples 15 mice from a single population. She reports a sample mean body mass of 24.3 g with SD 1.8 g. If she re-samples 15 different mice from the same population, the new sample mean will most likely be…
# Stage A: one variable, no predictor.set.seed(42)n <- 40mu <- 8.0 # true population meansigma <- 1.2 # true population SD# simulate the sample -- every frog is a draw from the SAME distributiony <- rnorm(n, mean = mu, sd = sigma)# estimates from the sample (these wiggle run-to-run)mean(y); sd(y)hist(y, breaks = 20, col = "gray80", border = "white", xlim = c(0, 20), xlab = "body mass (g)", main = "")abline(v = mu, col = "#b23a48", lwd = 2, lty = 2)abline(v = mean(y), col = "#2f6b8f", lwd = 2)
Now you know which pond the frog came from. That is new information. The best guess is no longer a single number; it is the group mean. The "slope" on a 0/1 predictor is the difference between the two group means.
For this stage: recognize y = mx + b from algebra inside the regression equation. Read δ as the difference between two group means, not as a separate mysterious object.
Now the frog comes with a tag: lowland (g = 0) or upland (g = 1). One new piece of information. The best guess is no longer μ for everyone. It is α for lowland frogs, and α + δ for upland frogs.
The object is already familiar from algebra. y = mx + b is the same line, renamed and split into two pieces so each piece can do its own job:
μi = α + δ · gi
yi ~ Normal(μi, σ)
α is the intercept — the guess when g = 0. δ is the slope — how much the guess changes when g goes up by one. Since g takes only two values, the "line" has only two points on it: α, and α + δ. δ is not just "the slope." It is the difference between the two group means.
The second line is Stage A layered on: each yi is a noisy draw around its group mean, with spread σ. Same σ for both groups.
A t-test is this regression. lm(y ~ g) returns δ as the slope coefficient, and the standard error on that coefficient is the standard error of the difference between two sample means. Two names, one calculation.
A friend fits lm(blood_pressure ~ treatment) with treatment coded 0 (placebo) and 1 (new drug). R prints the slope on treatment as −4.2 mmHg. The best reading is…
# Stage B: binary predictor. y ~ N(alpha + delta*g, sigma), g in {0, 1}.set.seed(42)n <- 40 # per groupalpha <- 10.0 # mean of group 0 (lowland)delta <- 1.00 # mean of group 1 minus mean of group 0sigma <- 1.0g <- rep(c(0, 1), each = n) # 0/1 predictory <- rnorm(2*n, mean = alpha + delta*g, sd = sigma)# REGRESSION form. Coefficient on g is delta_hat.fit <- lm(y ~ g)coef(fit)summary(fit)$coefficients[2, "Std. Error"]# T-TEST form. Same number, different packaging.tt <- t.test(y ~ g, var.equal = TRUE)tt$estimate[2] - tt$estimate[1] # = coef(fit)["g"]plot(jitter(g, amount = 0.08), y, pch = 16, col = "gray40", xaxt = "n", xlab = "group", ylab = "body mass (g)")axis(1, at = c(0, 1), labels = c("lowland", "upland"))abline(h = mean(y[g==0]), col = "#b23a48", lwd = 2)abline(h = mean(y[g==1]), col = "#2f6b8f", lwd = 2)
A second number on each frog. Do the two travel together? The correlation coefficient r is one number that says how tightly.
For this stage: read r at a glance. Sign is the direction of the tilt; magnitude is how tightly the two numbers line up. Note that r̂ wiggles run to run even when the true r is zero.
Same frogs. Now you measure a second number on each: snout–vent length. You want to know whether two measurements on the same frog travel together, whether bigger frogs tend to be both longer and heavier.
The correlation coefficient r is one number that answers the question. It lives between −1 and +1:
Sign is the direction of the tilt. Magnitude is how tightly the two numbers line up. That is all r is.
Note that r is symmetric: swap x and y and r does not change. And r is a description, not a cause. A nonzero r says two measurements move together in this sample. It does not say one caused the other.
Watch for: even when the real r is zero, the sample r̂ is almost never exactly zero. Press "new seed" a few times. The wiggle you see is the same thing ȳ was doing in Stage A — an estimate wobbling around a true value that itself does not move.
You plot tree height against trunk diameter for 30 trees and compute r̂ = 0.06. The best reading is…
# Stage C: correlation -- two variables that co-vary.set.seed(42)n <- 60r <- 0.60sx <- 1.5sy <- 1.0# build a bivariate normal sample from two independent drawsz1 <- rnorm(n); z2 <- rnorm(n)x <- sx * z1y <- sy * (r * z1 + sqrt(1 - r^2) * z2)# sample correlation -- this wiggles sample to samplecor(x, y)plot(x, y, pch = 16, col = "gray40", xlim = c(-8, 8), ylim = c(-8, 8), xlab = "snout-vent length", ylab = "body mass")
Use x to predict y. The rule is a line. R² answers how much that line improved your guess over the Stage A mean.
For this stage: name each piece of the regression in the prediction frame. β is how the guess changes per unit of x; α is the guess at x = 0; σ is the typical error around the line; R² is how much knowing x improved the guess over the Stage A mean. Watch R² and σ move together.
Same frogs. Use snout–vent length x as the predictor for body mass y. The prediction rule is a line:
yi ~ Normal(β·xi + α, σ)
For any frog with length x, the best guess is β · x + α. α is the intercept (the guess at x = 0). β is the slope (how much the guess changes when x goes up by one unit). σ is the spread around the line, the typical size of the error — same role as in Stage A.
The trick is to see this as Stage A with one new piece of information. Set β = 0 and the rule collapses to a single number (α = ȳ): you are predicting the same y for every frog, no matter its length. Set β ≠ 0 and the guess slides along a line as x changes. The information in x has entered the guess.
We fit (α̂, β̂) by ordinary least squares ("draw the one line that sits closest to all the points"). That gives one line. Turn on inferred lines and you see many others that the data find nearly as plausible — a cloud of rules, not a single answer. The width of the cloud is the uncertainty in β̂.
R² and σ are two ways of saying the same thing. R² is the fraction of y's variation that the line accounts for. Drag σ up and R² falls; drag σ down and R² climbs toward 1. Small σ means the points hug the line and knowing x helped a lot. Large σ means the points scatter away from the line and knowing x barely helped. One story, two axes.
What SE(β̂) is. It is the standard deviation of the cloud of inferred slopes. As a general rule, if β̂ is more than about 2 SE away from zero, zero sits outside the plausible cloud, and a p-value would call the slope "significant." The cloud itself is the real object. A p-value just converts "how far is zero from the cloud?" into a single number.
A regression returns ŷ = 1.2 + 0.85·x with R² = 0.07 and SE(β̂) ≈ 0.41. Which summary is best?
# Stage D: regression with many lines nearly as good as the best one.set.seed(42)n <- 40alpha <- 4.0 # interceptbeta <- 0.80 # slopesigma <- 1.2 # residual SDx <- runif(n, min = 0, max = 10)y <- rnorm(n, mean = alpha + beta * x, sd = sigma)# OLS fit -- one point estimatefit <- lm(y ~ x)coef(fit)# inferred lines -- resample the data with replacement and refit many times.# Each resample gives one line that is compatible with the data.n_lines <- 60inferred <- replicate(n_lines, { i <- sample(n, n, replace = TRUE) coef(lm(y[i] ~ x[i]))})plot(x, y, pch = 16, col = "gray40", xlim = c(0, 10), ylim = c(-5, 25), xlab = "x", ylab = "y")apply(inferred, 2, function(ab) abline(ab[1], ab[2], col = rgb(0.7, 0.23, 0.28, 0.05)))abline(fit, col = "#b23a48", lwd = 2)
Each pond has its own line. "Are the slopes clearly different?" becomes a visual question: do the two clouds of plausible slopes overlap?
For this stage: translate "is the difference statistically significant?" into "do the two clouds of plausible slopes overlap?" Same question, no jargon. Say what a permutation p-value actually answers.
Two populations of frogs now: a lowland pond (A) and an upland pond (B). Each has its own prediction rule, its own line:
yi,g ~ Normal(βg·xi,g + αg, σ)
The question is whether these are the same rule or two different rules. Concretely: is βA different from βB?
Draw the cloud of plausible slopes for each pond, separately. If the clouds do not overlap, the data rules out any shared slope — the rules are clearly different. If the clouds do overlap, there is a range of slopes compatible with both ponds, including "both the same." That is the question a significance test answers, with the jargon stripped out.
A second view, more formal. Shuffle the pond labels many times and each time recompute the difference in fitted slopes. The shuffled distribution answers: if the two ponds truly shared one slope, how big a split would we see just from how frogs got assigned? If the observed split sits out in the tail of that distribution, the two slopes are clearly different.
What this p-value says. It is the fraction of shuffled worlds in which the split was at least as extreme as ours. Note that it does not say "there is a 5% chance the ponds share a slope." It says: if they did share one, 5% of random shuffles would look at least this different from each other.
A study reports 95% slope intervals for two groups: group 1: [0.45, 0.92] and group 2: [0.81, 1.34]. The best reading is…
# Stage E: two slopes. Are they clearly different?set.seed(42)n <- 50 # per groupbA <- 1.00bB <- 0.70s <- 1.2xA <- runif(n, 0, 10); yA <- rnorm(n, bA*xA + 4, s)xB <- runif(n, 0, 10); yB <- rnorm(n, bB*xB + 4, s)# Inferred lines per group (bootstrap resampling within each group).K <- 200inf_A <- replicate(K, coef(lm(yA[i] ~ xA[i], data = data.frame( i = sample(n, n, replace = TRUE)))))inf_B <- replicate(K, coef(lm(yB[i] ~ xB[i], data = data.frame( i = sample(n, n, replace = TRUE)))))# Overlap between the two slope clouds -- the "clearly different" check.qA <- quantile(inf_A[2, ], c(0.025, 0.975))qB <- quantile(inf_B[2, ], c(0.025, 0.975))overlap <- max(0, min(qA[2], qB[2]) - max(qA[1], qB[1]))# Pooled null: shuffle group labels and recompute the split.P <- 500x <- c(xA, xB); y <- c(yA, yB)null_d <- replicate(P, { g <- sample(c(rep(TRUE, n), rep(FALSE, n))) coef(lm(y[g] ~ x[g] ))[2] - coef(lm(y[!g] ~ x[!g]))[2]})obs <- coef(lm(yA ~ xA))[2] - coef(lm(yB ~ xB))[2]mean(abs(null_d) >= abs(obs)) # fraction at least as extreme