BIO 202 — Lesson 1: Adding up coin flips until a bell appears

A — Flip a coin, 20 times. Three rounds.

Pick a number. Flip 20 coins. See how close you got. Do it three times.

What to do

Pick how many heads you expect, on the right. Then click Flip Coin until you have flipped 20. Three rounds; the running tally below tracks how close you were each time.

Number of heads

flips so far: 0 / 20 | heads: 0

A guess of 10 counts as ✓ only when 10 heads actually come up. That happens about 1 round in 6. Watch what's happening to your guesses themselves across the three rounds even when individual rounds come up ✗.

Round 1 of 3

Prediction

How many heads do you expect to see in 20 single coin flips?

Controls

seed 42

R code

set.seed(42)flips <- rbinom(20, size = 1, prob = 0.5)sum(flips)   # count of heads

Pause — what your "error" looked like

How far off each round's prediction was from the actual heads count.

Your three rounds

Round	You predicted	Heads observed	Off by

Average error across 3 rounds: —

That last column — how far off each prediction was — is your error. Here the miss is sampling scatter: a finite batch of 20 flips lands off the long-run 50/50 by chance. Flip more coins per round and this scatter shrinks — it is a property of the sample size, not of you.

A good predictor doesn't hit zero on every single try (the sampling scatter still bites). What it does is land with an average error near zero — too high about as often as too low. Stage B turns this into a running tracker — but watch for a second, different kind of miss there, one that does not shrink no matter how much data you collect.

B — Guess a person's height

Real human heights. Get the average error low across 30 or more draws to move on.

What to do

Pick a single guess (in centimeters) for the height of a random adult. Click Draw to pull one. Repeat. If your average error across 30 draws ends up low — close enough to zero — you move on. If not, the page resets.

Heights you have drawn so far

your guess: — | draws so far: 0 / 30 | overestimated: 0 | underestimated: 0 | average error: —

Controls

your guess (cm) cm

R code

# 30 random adult heights from real data.nh <- read.csv("data/clean/nhanes_adults.csv")set.seed(7)truths <- sample(nh$Height, 30)guess <- ___    # type a value; aim for an average error near zeromean(guess - truths)

C — Naming the pieces, on real data

Two numbers behind everything you have done so far.

What to do

Click Draw one adult. Each click pulls one real NHANES height and adds its distance from the population mean to the second plot.

NHANES adult heights — the whole population

μ (best constant guess): — cm | σ (typical distance from μ): — cm

Your draws — distance from μ

draws: 0 | average signed error: — | typical |error|: —

Prediction

In Stage B you found the height that made your average error fall toward zero. Pick the description of that number that fits what it just did for you:

The population mean, μ The standard deviation, σ The median height The tallest height in the sample

Take 30 draws to wrap up. 0/30 draws

Controls

What the plots are showing

Top: the whole adult population. The solid red line is μ — the same number your Stage B average error settled around. The shaded band is ± σ — a typical distance any single adult sits from μ.

Bottom: the miss for each draw (truth − μ) — the individual spread, or residual once a model is in play. After enough draws its distribution settles to width σ, centered at zero. Unlike Stage A's sampling scatter, this spread does not shrink as you draw more people — it is how different adults are from one another, a fixed property of the population. Keep the two misses apart: sampling scatter shrinks with n (this is what drift will be, in Lesson 10) and individual spread stays at σ (this is the residual the rest of the course subtracts, starting in Lesson 3).

y_i ~ Normal(μ, σ)

R code

nh <- read.csv("data/clean/nhanes_adults.csv")y <- nh$Heightmean(y); sd(y)   # μ and σ from real datahist(y, breaks = 40, freq = FALSE, col = "gray85", border = "white")abline(v = mean(y), col = "#b23a48", lwd = 2)abline(v = mean(y) + c(-1, 1) * sd(y), col = "#b23a48", lty = 3)

Stretch challenge (optional)

Open the downloaded .R and change nh$Height to nh$Weight. Compute μ and σ on weights. Compute the median as well. Which one — mean or median — is closer to "typical weight"? Plot both as vertical lines on the histogram.

Not yet attempted.

One more move — the same machine, but on two groups

NHANES adults are roughly half men and half women. Run the same μ-as-best-constant machine within each group. You get two μs and two σs instead of one.

men: μ = — cm, σ = — cm | women: μ = — cm, σ = — cm

between-group difference: — cm | within-group typical spread: — cm

Notice: the between-group difference (men's μ − women's μ) is a few times smaller than the within-group spread (σ inside either group).

Most of the variation in adult height is within men or within women — not between the two groups. The same machine, told to operate inside two groups, gives you a number for each kind of variation.

Lesson 6 returns to this — every two-group test is asking whether the between-group difference is bigger than the within-group spread would predict by chance.

Lesson 1 — Adding up coin flips until a bell appears

What this lesson is asking

A — Flip a coin, 20 times. Three rounds.

What to do

Number of heads

Prediction

Round 1 complete

Controls

R code

Pause — what your "error" looked like

Your three rounds

B — Guess a person's height

What to do

Heights you have drawn so far

Controls

Error still too high. One quick question before you try again.

You met the threshold

R code

C — Naming the pieces, on real data

What to do

NHANES adult heights — the whole population

Your draws — distance from μ

Prediction

Controls

What the plots are showing

R code

Stretch challenge (optional)

One more move — the same machine, but on two groups