Lesson 0 — Prediction, and what a regression is actually doing

BIO 202, Spring 2026 — draft v3. Each stage hands you one more piece of information and asks the same question again. Each stage unlocks after you predict and explore the previous one.

What Lesson 0 is actually asking

One question, five times: what is your best guess for y, and how wrong will you typically be? Each stage gives you one more piece of information about the frog and asks whether the guess gets any better.

The formulas come along for the ride. The skill is translating a formula back into the prediction question out loud.

A — With no other information, the best guess is the mean

Someone hands you a new frog. You have no other information. The best single-number guess is the population mean μ, and σ is how wrong that guess will typically be.

For this stage: say why μ is the best guess under no-information, why σ is the typical size of the error, and why the sample ȳ wiggles around the true μ run to run.

Complete the previous stage to unlock this section.

Scenario

Gray tree frogs, one pond, no other information. Someone hands you a frog and asks for its body mass. The best number to say is the population mean μ. It minimizes your average squared error against any other value you could pick.

yi ~ Normal(μ, σ)

You do not know μ. You have a sample of n frogs, and from them you compute ȳ. ȳ is your estimate of μ. A different n frogs would give a slightly different ȳ. The true μ is a property of the world; the estimate wiggles run to run.

σ is the other half of the story: how wrong this guess will typically be. A typical frog sits about σ away from μ. Small σ means the mean is a sharp prediction. Large σ means the mean is the best you can do, and the best is not very good.

Two parameters, one for the guess and one for the size of the error. Stage B adds one new piece of information about the frog and asks whether the guess gets any better.

Note that Stage D writes y ~ Normal(β·x + α, σ). Set β = 0 and you are back here. Regression does not replace the mean; it extends it.

Simulated sample (histogram with fixed axes)

true μ:  |  sample ȳ:  |  sample sd:

Prediction (required before sliders unlock)

  1. Q1. You raise σ from 0.5 to 2.5, keeping μ and n fixed. The histogram…
  2. Q2. You raise n from 20 to 500, keeping μ and σ fixed. The sample mean ȳ…
Explore the sliders to unlock Stage B. 0/5 moves

Transfer question — new scenario

A colleague samples 15 mice from a single population. She reports a sample mean body mass of 24.3 g with SD 1.8 g. If she re-samples 15 different mice from the same population, the new sample mean will most likely be…

Controls

40
8.0
1.2
42

R code (base R — mirrors this simulation)

# Stage A: one variable, no predictor.set.seed(42)n     <- 40mu    <- 8.0    # true population meansigma <- 1.2   # true population SD# simulate the sample -- every frog is a draw from the SAME distributiony <- rnorm(n, mean = mu, sd = sigma)# estimates from the sample (these wiggle run-to-run)mean(y); sd(y)hist(y, breaks = 20, col = "gray80", border = "white",     xlim = c(0, 20), xlab = "body mass (g)", main = "")abline(v = mu,      col = "#b23a48", lwd = 2, lty = 2)abline(v = mean(y), col = "#2f6b8f", lwd = 2)

B — Add a group label, and the guess splits in two

Now you know which pond the frog came from. That is new information. The best guess is no longer a single number; it is the group mean. The "slope" on a 0/1 predictor is the difference between the two group means.

For this stage: recognize y = mx + b from algebra inside the regression equation. Read δ as the difference between two group means, not as a separate mysterious object.

Complete Stage A (submit prediction and move the sliders a few times) to unlock this section.

Scenario

Now the frog comes with a tag: lowland (g = 0) or upland (g = 1). One new piece of information. The best guess is no longer μ for everyone. It is α for lowland frogs, and α + δ for upland frogs.

The object is already familiar from algebra. y = mx + b is the same line, renamed and split into two pieces so each piece can do its own job:

μi = α + δ · gi

yi ~ Normal(μi, σ)

α is the intercept — the guess when g = 0. δ is the slope — how much the guess changes when g goes up by one. Since g takes only two values, the "line" has only two points on it: α, and α + δ. δ is not just "the slope." It is the difference between the two group means.

The second line is Stage A layered on: each yi is a noisy draw around its group mean, with spread σ. Same σ for both groups.

A t-test is this regression. lm(y ~ g) returns δ as the slope coefficient, and the standard error on that coefficient is the standard error of the difference between two sample means. Two names, one calculation.

Two groups, side by side (fixed axes)

ȳ0:  |  ȳ1:  |  δ̂ = ȳ1 − ȳ0:  |  SE(δ̂):
generative truth: α = , α + δ =  (σ = )

Prediction (required before sliders unlock)

  1. Q1. You set δ = 0 (the two ponds actually share one mean). The estimated δ̂ from your sample will be…
  2. Q2. You raise n (per group) from 10 to 200, keeping α, δ, σ fixed. SE(δ̂) will…
Explore the sliders to unlock Stage C. 0/5 moves

Transfer question — new scenario

A friend fits lm(blood_pressure ~ treatment) with treatment coded 0 (placebo) and 1 (new drug). R prints the slope on treatment as −4.2 mmHg. The best reading is…

Controls

40
10.0
1.00
1.0
42

R code (base R) — regression and t-test are the same calculation

# Stage B: binary predictor. y ~ N(alpha + delta*g, sigma), g in {0, 1}.set.seed(42)n     <- 40    # per groupalpha <- 10.0  # mean of group 0 (lowland)delta <- 1.00  # mean of group 1 minus mean of group 0sigma <- 1.0g <- rep(c(0, 1), each = n)                  # 0/1 predictory <- rnorm(2*n, mean = alpha + delta*g, sd = sigma)# REGRESSION form. Coefficient on g is delta_hat.fit <- lm(y ~ g)coef(fit)summary(fit)$coefficients[2, "Std. Error"]# T-TEST form. Same number, different packaging.tt <- t.test(y ~ g, var.equal = TRUE)tt$estimate[2] - tt$estimate[1]   # = coef(fit)["g"]plot(jitter(g, amount = 0.08), y, pch = 16, col = "gray40",     xaxt = "n", xlab = "group", ylab = "body mass (g)")axis(1, at = c(0, 1), labels = c("lowland", "upland"))abline(h = mean(y[g==0]), col = "#b23a48", lwd = 2)abline(h = mean(y[g==1]), col = "#2f6b8f", lwd = 2)

C — Two measurements that travel together

A second number on each frog. Do the two travel together? The correlation coefficient r is one number that says how tightly.

For this stage: read r at a glance. Sign is the direction of the tilt; magnitude is how tightly the two numbers line up. Note that r̂ wiggles run to run even when the true r is zero.

Complete Stage B (submit prediction and move the sliders a few times) to unlock this section.

Scenario

Same frogs. Now you measure a second number on each: snout–vent length. You want to know whether two measurements on the same frog travel together, whether bigger frogs tend to be both longer and heavier.

The correlation coefficient r is one number that answers the question. It lives between −1 and +1:

  • r ≈ +1 — the two numbers track tightly, moving in the same direction.
  • r ≈ 0 — no tilt. Knowing one tells you essentially nothing about the other.
  • r ≈ −1 — the two numbers track tightly, moving in opposite directions.

Sign is the direction of the tilt. Magnitude is how tightly the two numbers line up. That is all r is.

Note that r is symmetric: swap x and y and r does not change. And r is a description, not a cause. A nonzero r says two measurements move together in this sample. It does not say one caused the other.

Watch for: even when the real r is zero, the sample r̂ is almost never exactly zero. Press "new seed" a few times. The wiggle you see is the same thing ȳ was doing in Stage A — an estimate wobbling around a true value that itself does not move.

Bivariate scatter (fixed axes)

true r:  |  sample r̂:  |  |r̂ − r|:

Prediction (required before sliders unlock)

  1. Q1. You set the true r to 0 and take a sample of n = 30. The sample r̂ will be…
  2. Q2. r = +0.9 and r = −0.9 — how do the two scatters compare?
Explore the sliders to unlock Stage D. 0/5 moves

Transfer question — new scenario

You plot tree height against trunk diameter for 30 trees and compute r̂ = 0.06. The best reading is…

Controls

60
0.60
1.5
1.0
42

R code (base R)

# Stage C: correlation -- two variables that co-vary.set.seed(42)n  <- 60r  <- 0.60sx <- 1.5sy <- 1.0# build a bivariate normal sample from two independent drawsz1 <- rnorm(n); z2 <- rnorm(n)x  <- sx * z1y  <- sy * (r * z1 + sqrt(1 - r^2) * z2)# sample correlation -- this wiggles sample to samplecor(x, y)plot(x, y, pch = 16, col = "gray40",     xlim = c(-8, 8), ylim = c(-8, 8),     xlab = "snout-vent length", ylab = "body mass")

D — Turn the second measurement into a prediction rule

Use x to predict y. The rule is a line. R² answers how much that line improved your guess over the Stage A mean.

For this stage: name each piece of the regression in the prediction frame. β is how the guess changes per unit of x; α is the guess at x = 0; σ is the typical error around the line; R² is how much knowing x improved the guess over the Stage A mean. Watch R² and σ move together.

Complete Stage C (submit prediction and move the sliders a few times) to unlock this section.

Scenario

Same frogs. Use snout–vent length x as the predictor for body mass y. The prediction rule is a line:

yi ~ Normal(β·xi + α, σ)

For any frog with length x, the best guess is β · x + α. α is the intercept (the guess at x = 0). β is the slope (how much the guess changes when x goes up by one unit). σ is the spread around the line, the typical size of the error — same role as in Stage A.

The trick is to see this as Stage A with one new piece of information. Set β = 0 and the rule collapses to a single number (α = ȳ): you are predicting the same y for every frog, no matter its length. Set β ≠ 0 and the guess slides along a line as x changes. The information in x has entered the guess.

We fit (α̂, β̂) by ordinary least squares ("draw the one line that sits closest to all the points"). That gives one line. Turn on inferred lines and you see many others that the data find nearly as plausible — a cloud of rules, not a single answer. The width of the cloud is the uncertainty in β̂.

R² and σ are two ways of saying the same thing. R² is the fraction of y's variation that the line accounts for. Drag σ up and R² falls; drag σ down and R² climbs toward 1. Small σ means the points hug the line and knowing x helped a lot. Large σ means the points scatter away from the line and knowing x barely helped. One story, two axes.

What SE(β̂) is. It is the standard deviation of the cloud of inferred slopes. As a general rule, if β̂ is more than about 2 SE away from zero, zero sits outside the plausible cloud, and a p-value would call the slope "significant." The cloud itself is the real object. A p-value just converts "how far is zero from the cloud?" into a single number.

Scatter + fit + inferred-lines cloud (fixed axes)

fit: ŷ = + · x  |  SE(β̂):  |  R²:
generative truth: y = + · x  (σ = )

Prediction (required before sliders unlock)

  1. Q1. You halve n (e.g. 200 → 100), keeping β and σ fixed. The cloud of inferred lines…
  2. Q2. You set the true β = 0 and run one simulation with n = 50, σ = 1.5. The fitted slope β̂ will be…
Explore the sliders to unlock Stage E. 0/5 moves

Transfer question — new scenario

A regression returns ŷ = 1.2 + 0.85·x with R² = 0.07 and SE(β̂) ≈ 0.41. Which summary is best?

Controls

40
4.0
0.80
1.2
60
42

R code (base R)

# Stage D: regression with many lines nearly as good as the best one.set.seed(42)n     <- 40alpha <- 4.0    # interceptbeta  <- 0.80   # slopesigma <- 1.2    # residual SDx <- runif(n, min = 0, max = 10)y <- rnorm(n, mean = alpha + beta * x, sd = sigma)# OLS fit -- one point estimatefit <- lm(y ~ x)coef(fit)# inferred lines -- resample the data with replacement and refit many times.# Each resample gives one line that is compatible with the data.n_lines <- 60inferred <- replicate(n_lines, {  i <- sample(n, n, replace = TRUE)  coef(lm(y[i] ~ x[i]))})plot(x, y, pch = 16, col = "gray40",     xlim = c(0, 10), ylim = c(-5, 25),     xlab = "x", ylab = "y")apply(inferred, 2, function(ab)  abline(ab[1], ab[2], col = rgb(0.7, 0.23, 0.28, 0.05)))abline(fit, col = "#b23a48", lwd = 2)

E — Two populations, two prediction rules. Can we tell them apart?

Each pond has its own line. "Are the slopes clearly different?" becomes a visual question: do the two clouds of plausible slopes overlap?

For this stage: translate "is the difference statistically significant?" into "do the two clouds of plausible slopes overlap?" Same question, no jargon. Say what a permutation p-value actually answers.

Complete Stage D (submit prediction and move the sliders a few times) to unlock this section.

Scenario

Two populations of frogs now: a lowland pond (A) and an upland pond (B). Each has its own prediction rule, its own line:

yi,g ~ Normal(βg·xi,g + αg, σ)

The question is whether these are the same rule or two different rules. Concretely: is βA different from βB?

Draw the cloud of plausible slopes for each pond, separately. If the clouds do not overlap, the data rules out any shared slope — the rules are clearly different. If the clouds do overlap, there is a range of slopes compatible with both ponds, including "both the same." That is the question a significance test answers, with the jargon stripped out.

A second view, more formal. Shuffle the pond labels many times and each time recompute the difference in fitted slopes. The shuffled distribution answers: if the two ponds truly shared one slope, how big a split would we see just from how frogs got assigned? If the observed split sits out in the tail of that distribution, the two slopes are clearly different.

What this p-value says. It is the fraction of shuffled worlds in which the split was at least as extreme as ours. Note that it does not say "there is a 5% chance the ponds share a slope." It says: if they did share one, 5% of random shuffles would look at least this different from each other.

Two scatters + two clouds of inferred lines (fixed axes)

If the groups shared one slope: what splits would we see?

β̂A:  |  β̂B:  |  observed Δβ̂:  |  under pooled null, at least as extreme:
clouds overlap?

Prediction (required before sliders unlock)

  1. Q1. You set βA = βB (truly equal slopes). The two clouds of inferred lines should…
  2. Q2. With βA = 1, βB = 0.7, σ = 1.5, you double the n per group. The two clouds will…
Explore the sliders to finish Lesson 0. 0/5 moves

Transfer question — new scenario

A study reports 95% slope intervals for two groups: group 1: [0.45, 0.92] and group 2: [0.81, 1.34]. The best reading is…

Controls

50
1.00
0.70
1.2
200
500
42

R code (base R)

# Stage E: two slopes. Are they clearly different?set.seed(42)n  <- 50     # per groupbA <- 1.00bB <- 0.70s  <- 1.2xA <- runif(n, 0, 10); yA <- rnorm(n, bA*xA + 4, s)xB <- runif(n, 0, 10); yB <- rnorm(n, bB*xB + 4, s)# Inferred lines per group (bootstrap resampling within each group).K <- 200inf_A <- replicate(K, coef(lm(yA[i] ~ xA[i], data = data.frame(  i = sample(n, n, replace = TRUE)))))inf_B <- replicate(K, coef(lm(yB[i] ~ xB[i], data = data.frame(  i = sample(n, n, replace = TRUE)))))# Overlap between the two slope clouds -- the "clearly different" check.qA <- quantile(inf_A[2, ], c(0.025, 0.975))qB <- quantile(inf_B[2, ], c(0.025, 0.975))overlap <- max(0, min(qA[2], qB[2]) - max(qA[1], qB[1]))# Pooled null: shuffle group labels and recompute the split.P <- 500x <- c(xA, xB); y <- c(yA, yB)null_d <- replicate(P, {  g <- sample(c(rep(TRUE, n), rep(FALSE, n)))  coef(lm(y[g]  ~ x[g] ))[2] -    coef(lm(y[!g] ~ x[!g]))[2]})obs <- coef(lm(yA ~ xA))[2] - coef(lm(yB ~ xB))[2]mean(abs(null_d) >= abs(obs))   # fraction at least as extreme