BIO 202 — Lesson 9: Building the population where nothing changes

What you'll do

Four stages. Build the panmictic baseline, then turn off the assumptions one at a time and see which violation moves the genotypes most. End on a real wild-population dataset and ask which assumption broke.

We're analogizing evolution to motion. Newton's first law: an object at rest tends to stay at rest unless acted on by an outside force. Hardy-Weinberg equilibrium is our object at rest. A gene frequency tends to stay at that frequency unless acted on by an outside force. The thing is, "outside force" here does not mean what you want it to mean. When you hear "force," you expect a thing that pushes — a separate thing. But the system not being infinite is itself enough. — 202_lec09_05

A — Infinite population, panmictic mating, no mutation

Slide the allele frequency p. Read the expected genotype frequencies p², 2pq, q². Watch the heterozygote curve peak at p = 0.5.

Scenario

One locus, two alleles. Allele A at frequency p. Allele a at frequency q = 1 − p. Under HWE: AA = p², Aa = 2pq, aa = q². Three numbers determined by one slider.

Hardy-Weinberg equilibrium is the fancy phrase we use for saying: this will be our baseline. Our expectation of no change. It is not real. It requires an infinite number of individuals alive at one time, and teleportation. Hardy-Weinberg never exists. But it's our baseline for comparison. — 202_lec08_05

Genotype frequencies vs p

p: 0.50 | AA: 0.250 | Aa: 0.500 | aa: 0.250

Prediction

Q1. At what allele frequency p is the heterozygote (Aa) frequency maximized?
p = 0 p = 0.5 p = 1 Depends on the population size

Slide p to at least 4 different values to unlock Stage B. 0/4 values

Controls

p (freq of A)0.50

R code — HWE genotype frequencies

p <- 0.50q <- 1 - pc(AA = p^2, Aa = 2*p*q, aa = q^2)

B — Turn the assumptions off

Four toggles. Each violates one HWE assumption. Watch the genotype frequencies drift away from p², 2pq, q² over generations.

Scenario

Start at HWE with p = 0.5. Run for 100 generations. Toggle finite N, mutation, selection, non-random mating. Each one peels the population off the equilibrium curve in a characteristic way.

Genotype frequency trajectories

gen: 100 | final p: — | final F: —

Prediction

Q1. Which violation moves p (the allele frequency) most, in a fixed run of 100 generations?
Finite population (drift) Mutation Selection Non-random mating

Toggle the four switches in different combinations at least 5 times. 0/5 combos

Controls

finite N (drift)

N100

mutation (μ = 1e-3)

selection (s)

s (cost of aa)0.05

non-random mating (F)

F0.30

seed42

R code — HWE violations

set.seed(42)N <- 100; gens <- 100; p <- 0.5mu <- 1e-3; s <- 0.05; F <- 0.30for (g in 1:gens) {  q <- 1 - p  fAA <- p^2 + F*p*q;  fAa <- 2*p*q*(1-F);  faa <- q^2 + F*p*q  w <- c(1, 1, 1 - s)            # aa is selected against  fAA <- fAA*w[1]; fAa <- fAa*w[2]; faa <- faa*w[3]  tot <- fAA + fAa + faa  p <- (fAA + fAa/2) / tot  p <- p*(1-mu) + (1-p)*mu          # symmetric mutation  if (finite) p <- rbinom(1, 2*N, p) / (2*N)}

C — Chi-squared on genotype counts

Sample N individuals at a given p. Compute the χ² on AA / Aa / aa counts vs the HWE expectation. The 1-df test.

Scenario

One locus. Sample N individuals. Compute observed AA / Aa / aa counts. The HWE-expected counts come from the observed allele frequency p̂. χ² with 1 degree of freedom (the AA, Aa, aa fractions are constrained to sum to 1 AND to give the observed p̂).

Observed vs HWE-expected genotype counts

N: 200 | p̂: — | F̂ (=1 − H_obs/H_exp): — | χ²: — | P: —

Prediction

Q1. With F = 0.3 (moderate inbreeding) and N = 200, the χ² test on HWE will:
Almost always reject HWE Rarely reject — the test is weak at this N Depends entirely on the seed

Try at least 3 (F, N) combinations to unlock Stage D. 0/3 combos

Controls

N200

true p0.50

true F0.30

seed42

R code — HWE chi-squared on observed counts

set.seed(42)N <- 200; p <- 0.5; F <- 0.30q <- 1 - pprobs <- c(p^2 + F*p*q, 2*p*q*(1-F), q^2 + F*p*q)obs <- rmultinom(1, N, probs)[,1]phat <- (2*obs[1] + obs[2]) / (2*N)exp_HWE <- N * c(phat^2, 2*phat*(1-phat), (1-phat)^2)sum((obs - exp_HWE)^2 / exp_HWE)   # χ² with 1 df

D — A wild locus — four candidate culprits

Italian sparrow loci. Some loci sit at HWE; some don't. For the ones that don't, you have four suspects: drift, mutation, selection, non-random mating. The test tells you only that something is off — not which.

Scenario

Genotype counts across loci in data/clean/italian_sparrow_loci.csv (or fallback synthetic). For each locus: estimate p̂, compute HWE-expected genotype counts, compute χ². Rank loci by departure from HWE.

Per-locus χ² values

loci: — | in HWE (P>0.05): — | out of HWE (P≤0.05): —

Prediction

Q1. Across many loci in a wild population, the fraction that fails an HWE test will be:
About 5% (the false-positive rate) More than 5% — wild populations rarely sit exactly at HWE Essentially zero — HWE is robust
Q2. For a locus that does fail HWE in this population, which of the four candidate forces are plausibly operating? Check every one you'd want to investigate. (Italian sparrows are a hybrid lineage of house and Spanish sparrows.)
Drift (finite Nₑ) Selection at or near the locus Mutation Non-random mating (assortative or inbreeding) Population structure / Wahlund effect from the parent-species mixture

Click a locus to see its per-genotype breakdown at least 2 times. 0/2 inspections

Controls

seed42

R code — per-locus HWE test

loci <- read.csv("data/clean/italian_sparrow_loci.csv")apply(loci, 1, function(r) {  obs <- c(r["AA"], r["Aa"], r["aa"])  N <- sum(obs)  phat <- (2*obs[1] + obs[2]) / (2*N)  e <- N * c(phat^2, 2*phat*(1-phat), (1-phat)^2)  sum((obs - e)^2 / e)})