Lesson 5 — Shuffling the predictor to see what chance can do

BIO 202, Spring 2026, draft v2. One regression, one shuffle, one tail. Used four times.

What you'll do

Four scenarios. In each, fit the regression, shuffle the predictor, look at where the observed coefficient sits in the cloud of shuffled ones. The tail fraction is what you'll report.

The move Darwin couldn't make. When Darwin saw that tall-shrub Galápagos islands had tall-necked tortoises and low-grass islands had short-necked ones, he was doing a regression in his head: shell shape on island vegetation. He could see the association but had no way to ask "would I see a slope this steep just by reshuffling which tortoise lives on which island?" That's the move you build here. In Stage C round 2 you do it on his actual birds — the Grant lab's Geospiza fortis beak depths straddling the 1977 drought. The deeper move: shuffling breaks the link between predictor and response, which is the same as asking "could this association have arisen without any channel of inheritance connecting them?" You will use this exact question at every later scale in the course.

A — Comparing two groups is a regression with a 0/1 predictor

Generate group A (g = 0) and group B (g = 1). Fit y ~ α + δ·g. Shuffle g, refit, build the null histogram of shuffled δ̂'s. Read the tail fraction.

Locked — confirm your name above to begin.

Scenario

Two groups: A (g = 0, mean 0) and B (g = 1, mean δ). Both with within-group SD σ and n points each. Stack them and fit y ~ α + δ·g.

yi ~ Normal(α + δ·gi, σ)

To ask "is δ̂ clearly nonzero?", shuffle g, refit 1000 times, build the bottom histogram. The empirical P is the tail.

Two groups (top) and the shuffled-label null distribution of δ̂ (bottom)

n / group: 1000  |  σ (pooled): 0.2  |  true δ: 0.01 (= 0.05 σ)
observed δ̂:  |  empirical P (1000 shuffles):  |  δ̂ / σ̂:

Prediction (required before sliders unlock)

  1. Q1. With true δ = 0.05·σ and n = 1000 per group, the histogram of shuffled δ̂'s will be:
  2. Q2. The observed δ̂ at this n will land:
Move sliders through 5 combinations to unlock Stage B. 0/5 combos

Controls

1000
0.20
0.010
42

R code — one regression and a shuffle loop

set.seed(42)n <- 1000sigma <- 0.2delta <- 0.01y <- c(rnorm(n, 0, sigma), rnorm(n, delta, sigma))g <- c(rep(0, n), rep(1, n))# the model: y ~ N(alpha + delta*g, sigma). delta_hat = group difference.d_obs <- coef(lm(y ~ g))[2]# shuffle g to break the relationship; refit; collect null delta_hat'sd_null <- replicate(1000, coef(lm(y ~ sample(g)))[2])# empirical two-sided P -- fraction at least as extreme as observedmean(abs(d_null) >= abs(d_obs))

B — Same regression, flipped inputs

A 0.5σ effect, moderate by any measure. New defaults: n = 8 per group, σ large. Rerun the same shuffle. Decide for yourself where δ̂ sits relative to the null.

Complete Stage A (submit prediction, try 5 combos) to unlock this section.

Scenario

Same regression. Same shuffle. New inputs: n = 8 per group, σ large, δ moderate (0.5·σ).

Watch the null histogram in the bottom panel and decide for yourself whether the observed δ̂ sits inside or outside it.

Two groups (top) and the shuffled-label null distribution of δ̂ (bottom)

n / group: 8  |  σ: 2.0  |  true δ: 1.0 (= 0.5 σ)
observed δ̂:  |  empirical P:  |  δ̂ / σ̂:

Prediction (required before sliders unlock)

  1. Q1. With δ ≈ 0.5σ and n = 8 per group, the null histogram of shuffled δ̂'s will:
  2. Q2. The most direct lever for shrinking the empirical P here is:
Move sliders through 5 combinations to unlock Stage C. 0/5 combos

Controls

8
2.0
1.0
42

R code — same model, much wider null

set.seed(42)n <- 8sigma <- 2.0delta <- 1.0y <- c(rnorm(n, 0, sigma), rnorm(n, delta, sigma))g <- c(rep(0, n), rep(1, n))d_obs <- coef(lm(y ~ g))[2]d_null <- replicate(1000, coef(lm(y ~ sample(g)))[2])mean(abs(d_null) >= abs(d_obs))

C — Same move on real time series, predictor = year

Five real datasets. For each one, fit y ~ α + β·year, shuffle the y values 1000 times, refit. Guess the 95% envelope of the resulting null histogram before revealing it.

Complete Stage B (submit prediction, try 5 combos) to unlock this section.

Scenario

Each round: a short real time series. Fit y ~ α + β·x. Shuffle the y values to build the null histogram of shuffled β̂'s.

Guess the 95% interval of that null first. Then reveal.

Rounds 1–2 are the Grant lab's Galápagos Geospiza fortis — Darwin's same archipelago, beaks measured in actual millimeters across the 1977 drought. The shuffle null is the test Darwin's argument never had.

Round 1 of 5 — Darwin's archipelago — Geospiza fortis beak depth, 1973–1976 (pre-drought)

Null histogram of shuffled slopes

observed slope β̂:  |  empirical P (two-sided):

Prediction (required before the drill starts)

  1. Q1. The "shuffle the predictor" recipe (or its mirror, "shuffle the response") produces a null distribution that is:
  2. Q2. You see "empirical P = 0.02" next to an observed slope. Translate that to plain language for a colleague:
Complete all 5 rounds to unlock Stage D. 0/5 rounds

R code — one regression and a shuffle loop, continuous predictor

# Per-round: x = year (or generation), y = trait mean / fitness.b_obs <- coef(lm(y ~ x))[2]b_null <- replicate(1000, {  coef(lm(sample(y) ~ x))[2]})quantile(b_null, c(0.025, 0.975))   # 95% null envelopemean(abs(b_null) >= abs(b_obs))   # empirical two-sided P

The slope is a number. What causal story is it evidence for?

You just got an empirical P. The slope β̂(beak depth ~ year) for the 1973–1977 Grant fortis is far outside the shuffle null. The test says "year predicts beak depth." But "year" doesn't do anything to a finch — it's a label on the x-axis.

Build a quick causal model below. Add arrows. Watch the simulated data on the right change. Then ask: which model would produce the slope you just observed, and which would not?

The "year → beak" arrow is what the shuffle test directly evaluates. The "drought → year, drought → beak" pair is the story Darwin would have written. The test cannot distinguish them — it's not the test's job. The job of the test is to tell you the apparent slope isn't from random reshuffling; the job of the DAG is to tell you what to do next.

D — Three sliders, one regression-shuffle null

Three sliders: δ, n, σ. Set them to produce three target scenarios.

Complete Stage C (finish 5 drill rounds) to unlock this section.

Scenario

Three challenges, in order:

  1. Tiny δ, empirical P < 0.05. δ ≤ 0.1·σ.
  2. Huge δ, empirical P > 0.05. δ ≥ 1.5·σ.
  3. Moderate δ, honest n. δ = 0.5σ, n = 30 per group.

Two groups (top) and shuffled-label null of δ̂ (bottom)

empirical P:  |  observed δ̂:  |  δ̂ / σ̂:
Challenge progress:

Prediction (required before sliders unlock)

  1. Q1. The width of the shuffled-label null on δ̂ depends most directly on:
  2. Q2. The phrase "clearly different" applied to a small empirical P tells you about:
Complete all 3 challenges to wrap up. 0/3 challenges

Controls

30
1.0
0.50
42

R code — three sliders, one regression null

set.seed(42)n <- 30sigma <- 1.0delta <- 0.5y <- c(rnorm(n, 0, sigma), rnorm(n, delta, sigma))g <- c(rep(0, n), rep(1, n))d_obs <- coef(lm(y ~ g))[2]d_null <- replicate(1000, coef(lm(y ~ sample(g)))[2])mean(abs(d_null) >= abs(d_obs))

Stretch challenge (optional, recorded)

Take the "huge δ, small n, P > 0.05" scenario and ask: how many more samples per group would I need to make this difference clearly distinguishable from the shuffled-label null? Re-run the shuffle null at several n values and find the smallest n where the empirical P stays below 0.05 across, say, 10 different seeds. Report the required n. Hit "I tried it" after you have a number.

Not yet attempted.