BIO 202, Spring 2026, draft v1. Hardy-Weinberg as the null. Turn the assumptions off one at a time and watch the genotype frequencies leave it.
Four stages. Build the panmictic baseline, then turn off the assumptions one at a time and see which violation moves the genotypes most. End on a real wild-population dataset and ask which assumption broke.
Slide the allele frequency p. Read the expected genotype frequencies p², 2pq, q². Watch the heterozygote curve peak at p = 0.5.
One locus, two alleles. Allele A at frequency p. Allele a at frequency q = 1 − p. Under HWE: AA = p², Aa = 2pq, aa = q². Three numbers determined by one slider.
p <- 0.50q <- 1 - pc(AA = p^2, Aa = 2*p*q, aa = q^2)
Four toggles. Each violates one HWE assumption. Watch the genotype frequencies drift away from p², 2pq, q² over generations.
Start at HWE with p = 0.5. Run for 100 generations. Toggle finite N, mutation, selection, non-random mating. Each one peels the population off the equilibrium curve in a characteristic way.
set.seed(42)N <- 100; gens <- 100; p <- 0.5mu <- 1e-3; s <- 0.05; F <- 0.30for (g in 1:gens) { q <- 1 - p fAA <- p^2 + F*p*q; fAa <- 2*p*q*(1-F); faa <- q^2 + F*p*q w <- c(1, 1, 1 - s) # aa is selected against fAA <- fAA*w[1]; fAa <- fAa*w[2]; faa <- faa*w[3] tot <- fAA + fAa + faa p <- (fAA + fAa/2) / tot p <- p*(1-mu) + (1-p)*mu # symmetric mutation if (finite) p <- rbinom(1, 2*N, p) / (2*N)}
Sample N individuals at a given p. Compute the χ² on AA / Aa / aa counts vs the HWE expectation. The 1-df test.
One locus. Sample N individuals. Compute observed AA / Aa / aa counts. The HWE-expected counts come from the observed allele frequency p̂. χ² with 1 degree of freedom (the AA, Aa, aa fractions are constrained to sum to 1 AND to give the observed p̂).
set.seed(42)N <- 200; p <- 0.5; F <- 0.30q <- 1 - pprobs <- c(p^2 + F*p*q, 2*p*q*(1-F), q^2 + F*p*q)obs <- rmultinom(1, N, probs)[,1]phat <- (2*obs[1] + obs[2]) / (2*N)exp_HWE <- N * c(phat^2, 2*phat*(1-phat), (1-phat)^2)sum((obs - exp_HWE)^2 / exp_HWE) # χ² with 1 df
Italian sparrow loci. Some loci sit at HWE; some don't. For the ones that don't, you have four suspects: drift, mutation, selection, non-random mating. The test tells you only that something is off — not which.
Genotype counts across loci in data/clean/italian_sparrow_loci.csv (or fallback synthetic). For each locus: estimate p̂, compute HWE-expected genotype counts, compute χ². Rank loci by departure from HWE.
loci <- read.csv("data/clean/italian_sparrow_loci.csv")apply(loci, 1, function(r) { obs <- c(r["AA"], r["Aa"], r["aa"]) N <- sum(obs) phat <- (2*obs[1] + obs[2]) / (2*N) e <- N * c(phat^2, 2*phat*(1-phat), (1-phat)^2) sum((obs - e)^2 / e)})