BIO 202 — Lesson 5: Shuffling the predictor to see what chance can do

What you'll do

Four scenarios. In each, fit the regression, shuffle the predictor, look at where the observed coefficient sits in the cloud of shuffled ones. The tail fraction is what you'll report.

The move Darwin couldn't make. When Darwin saw that tall-shrub Galápagos islands had tall-necked tortoises and low-grass islands had short-necked ones, he was doing a regression in his head: shell shape on island vegetation. He could see the association but had no way to ask "would I see a slope this steep just by reshuffling which tortoise lives on which island?" That's the move you build here. In Stage C round 2 you do it on his actual birds — the Grant lab's Geospiza fortis beak depths straddling the 1977 drought. The deeper move: shuffling breaks the link between predictor and response, which is the same as asking "could this association have arisen without any channel of inheritance connecting them?" You will use this exact question at every later scale in the course.

A — Comparing two groups is a regression with a 0/1 predictor

Generate group A (g = 0) and group B (g = 1). Fit y ~ α + δ·g. Shuffle g, refit, build the null histogram of shuffled δ̂'s. Read the tail fraction.

Scenario

Two groups: A (g = 0, mean 0) and B (g = 1, mean δ). Both with within-group SD σ and n points each. Stack them and fit y ~ α + δ·g.

y_i ~ Normal(α + δ·g_i, σ)

To ask "is δ̂ clearly nonzero?", shuffle g, refit 1000 times, build the bottom histogram. The empirical P is the tail.

Two groups (top) and the shuffled-label null distribution of δ̂ (bottom)

n / group: 1000 | σ (pooled): 0.2 | true δ: 0.01 (= 0.05 σ)

observed δ̂: — | empirical P (1000 shuffles): — | δ̂ / σ̂: —

Prediction (required before sliders unlock)

Q1. With true δ = 0.05·σ and n = 1000 per group, the histogram of shuffled δ̂'s will be:
wide — large n doesn't tighten the null narrow and tightly clustered near zero — large n makes shuffled δ̂'s small centered well away from zero — shuffling shouldn't fix the effect
Q2. The observed δ̂ at this n will land:
well inside the null histogram — δ is small so observed and null can't be told apart just outside the tightly clustered null histogram — small δ + large n is clearly distinguishable randomly inside or outside

Move sliders through 5 combinations to unlock Stage B. 0/5 combos

Controls

n / group1000

within σ0.20

true δ0.010

seed42

R code — one regression and a shuffle loop

set.seed(42)n <- 1000sigma <- 0.2delta <- 0.01y <- c(rnorm(n, 0, sigma), rnorm(n, delta, sigma))g <- c(rep(0, n), rep(1, n))# the model: y ~ N(alpha + delta*g, sigma). delta_hat = group difference.d_obs <- coef(lm(y ~ g))[2]# shuffle g to break the relationship; refit; collect null delta_hat'sd_null <- replicate(1000, coef(lm(y ~ sample(g)))[2])# empirical two-sided P -- fraction at least as extreme as observedmean(abs(d_null) >= abs(d_obs))

B — Same regression, flipped inputs

A 0.5σ effect, moderate by any measure. New defaults: n = 8 per group, σ large. Rerun the same shuffle. Decide for yourself where δ̂ sits relative to the null.

Scenario

Same regression. Same shuffle. New inputs: n = 8 per group, σ large, δ moderate (0.5·σ).

Watch the null histogram in the bottom panel and decide for yourself whether the observed δ̂ sits inside or outside it.

Two groups (top) and the shuffled-label null distribution of δ̂ (bottom)

n / group: 8 | σ: 2.0 | true δ: 1.0 (= 0.5 σ)

observed δ̂: — | empirical P: — | δ̂ / σ̂: —

Prediction (required before sliders unlock)

Q1. With δ ≈ 0.5σ and n = 8 per group, the null histogram of shuffled δ̂'s will:
be wide — small n means shuffled δ̂'s span a large range be narrow — the noise is independent of n be bimodal
Q2. The most direct lever for shrinking the empirical P here is:
collect more data (raise n) — the null narrows as you add data rerun the same experiment many times until P comes out small nothing — P is a property of the population, not the sample

Move sliders through 5 combinations to unlock Stage C. 0/5 combos

Controls

n / group8

within σ2.0

true δ1.0

seed42

R code — same model, much wider null

set.seed(42)n <- 8sigma <- 2.0delta <- 1.0y <- c(rnorm(n, 0, sigma), rnorm(n, delta, sigma))g <- c(rep(0, n), rep(1, n))d_obs <- coef(lm(y ~ g))[2]d_null <- replicate(1000, coef(lm(y ~ sample(g)))[2])mean(abs(d_null) >= abs(d_obs))

C — Same move on real time series, predictor = year

Five real datasets. For each one, fit y ~ α + β·year, shuffle the y values 1000 times, refit. Guess the 95% envelope of the resulting null histogram before revealing it.

Scenario

Each round: a short real time series. Fit y ~ α + β·x. Shuffle the y values to build the null histogram of shuffled β̂'s.

Guess the 95% interval of that null first. Then reveal.

Rounds 1–2 are the Grant lab's Galápagos Geospiza fortis — Darwin's same archipelago, beaks measured in actual millimeters across the 1977 drought. The shuffle null is the test Darwin's argument never had.

Round 1 of 5 — Darwin's archipelago — Geospiza fortis beak depth, 1973–1976 (pre-drought)

Null histogram of shuffled slopes

observed slope β̂: — | empirical P (two-sided): —

Prediction (required before the drill starts)

Q1. The "shuffle the predictor" recipe (or its mirror, "shuffle the response") produces a null distribution that is:
wider for short windows / small n — fewer data points means each shuffle can have a more lopsided slope by chance narrower for short windows — less data is easier to fit a single spike at zero regardless of window
Q2. You see "empirical P = 0.02" next to an observed slope. Translate that to plain language for a colleague:
"There's a 2% chance the true slope is exactly zero." (interpretation as a probability about the world) "If we'd shuffled the predictor to break any real relationship, 2% of those shuffled slopes would have been at least this far from zero — so the observed slope is unusual under a no-link world." "P and R² are the same number — both measure how much variance is explained."

Complete all 5 rounds to unlock Stage D. 0/5 rounds

R code — one regression and a shuffle loop, continuous predictor

# Per-round: x = year (or generation), y = trait mean / fitness.b_obs <- coef(lm(y ~ x))[2]b_null <- replicate(1000, {  coef(lm(sample(y) ~ x))[2]})quantile(b_null, c(0.025, 0.975))   # 95% null envelopemean(abs(b_null) >= abs(b_obs))   # empirical two-sided P

The slope is a number. What causal story is it evidence for?

You just got an empirical P. The slope β̂(beak depth ~ year) for the 1973–1977 Grant fortis is far outside the shuffle null. The test says "year predicts beak depth." But "year" doesn't do anything to a finch — it's a label on the x-axis.

Build a quick causal model below. Add arrows. Watch the simulated data on the right change. Then ask: which model would produce the slope you just observed, and which would not?

The "year → beak" arrow is what the shuffle test directly evaluates. The "drought → year, drought → beak" pair is the story Darwin would have written. The test cannot distinguish them — it's not the test's job. The job of the test is to tell you the apparent slope isn't from random reshuffling; the job of the DAG is to tell you what to do next.

D — Three sliders, one regression-shuffle null

Three sliders: δ, n, σ. Set them to produce three target scenarios.

Scenario

Three challenges, in order:

Tiny δ, empirical P < 0.05. δ ≤ 0.1·σ.
Huge δ, empirical P > 0.05. δ ≥ 1.5·σ.
Moderate δ, honest n. δ = 0.5σ, n = 30 per group.

Two groups (top) and shuffled-label null of δ̂ (bottom)

empirical P: — | observed δ̂: — | δ̂ / σ̂: —

Challenge progress: ① ② ③

Prediction (required before sliders unlock)

Q1. The width of the shuffled-label null on δ̂ depends most directly on:
n and σ — the null narrows as you add data or as the within-group spread shrinks the true δ — bigger true effects make a narrower null the random seed only
Q2. The phrase "clearly different" applied to a small empirical P tells you about:
the clarity of the difference relative to the shuffled-label null the absolute size of the difference both at once — empirical P captures clarity AND importance

Complete all 3 challenges to wrap up. 0/3 challenges

Controls

n / group30

within σ1.0

true δ0.50

seed42

R code — three sliders, one regression null

set.seed(42)n <- 30sigma <- 1.0delta <- 0.5y <- c(rnorm(n, 0, sigma), rnorm(n, delta, sigma))g <- c(rep(0, n), rep(1, n))d_obs <- coef(lm(y ~ g))[2]d_null <- replicate(1000, coef(lm(y ~ sample(g)))[2])mean(abs(d_null) >= abs(d_obs))

Stretch challenge (optional, recorded)

Take the "huge δ, small n, P > 0.05" scenario and ask: how many more samples per group would I need to make this difference clearly distinguishable from the shuffled-label null? Re-run the shuffle null at several n values and find the smallest n where the empirical P stays below 0.05 across, say, 10 different seeds. Report the required n. Hit "I tried it" after you have a number.

Not yet attempted.

Lesson 5 — Shuffling the predictor to see what chance can do

What you'll do

A — Comparing two groups is a regression with a 0/1 predictor

Scenario

Two groups (top) and the shuffled-label null distribution of δ̂ (bottom)

Prediction (required before sliders unlock)

Controls

R code — one regression and a shuffle loop

B — Same regression, flipped inputs

Scenario

Two groups (top) and the shuffled-label null distribution of δ̂ (bottom)

Prediction (required before sliders unlock)

Controls

R code — same model, much wider null

C — Same move on real time series, predictor = year

Scenario

Round 1 of 5 — Darwin's archipelago — Geospiza fortis beak depth, 1973–1976 (pre-drought)

Null histogram of shuffled slopes

Prediction (required before the drill starts)

Guess the 95% null envelope, then reveal

R code — one regression and a shuffle loop, continuous predictor

The slope is a number. What causal story is it evidence for?

D — Three sliders, one regression-shuffle null

Scenario

Two groups (top) and shuffled-label null of δ̂ (bottom)

Prediction (required before sliders unlock)

Controls

R code — three sliders, one regression null

Stretch challenge (optional, recorded)