Lesson 9 — Building the population where nothing changes

BIO 202, Spring 2026, draft v1. Hardy-Weinberg as the null. Turn the assumptions off one at a time and watch the genotype frequencies leave it.

What you'll do

Four stages. Build the panmictic baseline, then turn off the assumptions one at a time and see which violation moves the genotypes most. End on a real wild-population dataset and ask which assumption broke.

We're analogizing evolution to motion. Newton's first law: an object at rest tends to stay at rest unless acted on by an outside force. Hardy-Weinberg equilibrium is our object at rest. A gene frequency tends to stay at that frequency unless acted on by an outside force. The thing is, "outside force" here does not mean what you want it to mean. When you hear "force," you expect a thing that pushes — a separate thing. But the system not being infinite is itself enough. — 202_lec09_05

A — Infinite population, panmictic mating, no mutation

Slide the allele frequency p. Read the expected genotype frequencies p², 2pq, q². Watch the heterozygote curve peak at p = 0.5.

Locked — confirm your name above to begin.

Scenario

One locus, two alleles. Allele A at frequency p. Allele a at frequency q = 1 − p. Under HWE: AA = p², Aa = 2pq, aa = q². Three numbers determined by one slider.

Hardy-Weinberg equilibrium is the fancy phrase we use for saying: this will be our baseline. Our expectation of no change. It is not real. It requires an infinite number of individuals alive at one time, and teleportation. Hardy-Weinberg never exists. But it's our baseline for comparison. — 202_lec08_05

Genotype frequencies vs p

p: 0.50  |  AA: 0.250  |  Aa: 0.500  |  aa: 0.250

Prediction

  1. Q1. At what allele frequency p is the heterozygote (Aa) frequency maximized?
Slide p to at least 4 different values to unlock Stage B. 0/4 values

Controls

0.50

R code — HWE genotype frequencies

p <- 0.50q <- 1 - pc(AA = p^2, Aa = 2*p*q, aa = q^2)

B — Turn the assumptions off

Four toggles. Each violates one HWE assumption. Watch the genotype frequencies drift away from p², 2pq, q² over generations.

Complete Stage A (submit prediction, slide p ≥ 4 times) to unlock this section.

Scenario

Start at HWE with p = 0.5. Run for 100 generations. Toggle finite N, mutation, selection, non-random mating. Each one peels the population off the equilibrium curve in a characteristic way.

Genotype frequency trajectories

gen: 100  |  final p:  |  final F:

Prediction

  1. Q1. Which violation moves p (the allele frequency) most, in a fixed run of 100 generations?
Toggle the four switches in different combinations at least 5 times. 0/5 combos

Controls

100
0.05
0.30
42

R code — HWE violations

set.seed(42)N <- 100; gens <- 100; p <- 0.5mu <- 1e-3; s <- 0.05; F <- 0.30for (g in 1:gens) {  q <- 1 - p  fAA <- p^2 + F*p*q;  fAa <- 2*p*q*(1-F);  faa <- q^2 + F*p*q  w <- c(1, 1, 1 - s)            # aa is selected against  fAA <- fAA*w[1]; fAa <- fAa*w[2]; faa <- faa*w[3]  tot <- fAA + fAa + faa  p <- (fAA + fAa/2) / tot  p <- p*(1-mu) + (1-p)*mu          # symmetric mutation  if (finite) p <- rbinom(1, 2*N, p) / (2*N)}

C — Chi-squared on genotype counts

Sample N individuals at a given p. Compute the χ² on AA / Aa / aa counts vs the HWE expectation. The 1-df test.

Complete Stage B to unlock this section.

Scenario

One locus. Sample N individuals. Compute observed AA / Aa / aa counts. The HWE-expected counts come from the observed allele frequency p̂. χ² with 1 degree of freedom (the AA, Aa, aa fractions are constrained to sum to 1 AND to give the observed p̂).

Observed vs HWE-expected genotype counts

N: 200  |  p̂:  |  F̂ (=1 − H_obs/H_exp):  |  χ²:  |  P:

Prediction

  1. Q1. With F = 0.3 (moderate inbreeding) and N = 200, the χ² test on HWE will:
Try at least 3 (F, N) combinations to unlock Stage D. 0/3 combos

Controls

200
0.50
0.30
42

R code — HWE chi-squared on observed counts

set.seed(42)N <- 200; p <- 0.5; F <- 0.30q <- 1 - pprobs <- c(p^2 + F*p*q, 2*p*q*(1-F), q^2 + F*p*q)obs <- rmultinom(1, N, probs)[,1]phat <- (2*obs[1] + obs[2]) / (2*N)exp_HWE <- N * c(phat^2, 2*phat*(1-phat), (1-phat)^2)sum((obs - exp_HWE)^2 / exp_HWE)   # χ² with 1 df

D — A wild locus — four candidate culprits

Italian sparrow loci. Some loci sit at HWE; some don't. For the ones that don't, you have four suspects: drift, mutation, selection, non-random mating. The test tells you only that something is off — not which.

Complete Stage C to unlock this section.

Scenario

Genotype counts across loci in data/clean/italian_sparrow_loci.csv (or fallback synthetic). For each locus: estimate p̂, compute HWE-expected genotype counts, compute χ². Rank loci by departure from HWE.

Per-locus χ² values

loci:  |  in HWE (P>0.05):  |  out of HWE (P≤0.05):

Prediction

  1. Q1. Across many loci in a wild population, the fraction that fails an HWE test will be:
  2. Q2. For a locus that does fail HWE in this population, which of the four candidate forces are plausibly operating? Check every one you'd want to investigate. (Italian sparrows are a hybrid lineage of house and Spanish sparrows.)
Click a locus to see its per-genotype breakdown at least 2 times. 0/2 inspections

Controls

42

R code — per-locus HWE test

loci <- read.csv("data/clean/italian_sparrow_loci.csv")apply(loci, 1, function(r) {  obs <- c(r["AA"], r["Aa"], r["aa"])  N <- sum(obs)  phat <- (2*obs[1] + obs[2]) / (2*N)  e <- N * c(phat^2, 2*phat*(1-phat), (1-phat)^2)  sum((obs - e)^2 / e)})