BIO 202 — Lesson 8: Counting the ratios that breed true

What you'll do

Four stages. Run Mendelian crosses in simulation. Predict each ratio before you sample. End on Mendel's own data — and the chi-squared score that has been bothering geneticists since R. A. Fisher noticed it in 1936.

Did Mendel have any idea about genes? Did he even know what part of the cell was inheritance? When he came up with dominant and recessive — these words you all are learning now in his rules — what did he know about molecular biology? Nothing. Literally nothing. He probably knows more than Mendel did about genetics. Now, Mendel was very smart. He set up really careful experiments. — 202_lec01_06

A — One gene, two alleles, a 3:1 ratio

Simulate Mendel's monohybrid cross. Slide the offspring count. Watch how close the observed ratio is to 3:1.

Scenario

Two heterozygous pea plants (Aa × Aa). Each parent contributes one allele at random — Mendel's law of segregation. Three quarters of offspring should be dominant phenotype (AA or Aa); one quarter recessive (aa). The expected ratio is 3:1. The observed ratio is whatever you sample.

Observed counts vs expected 3:1

Phenotype	Observed	Expected (3:1)	(O−E)²/E
Dominant	—	—	—
Recessive	—	—	—
χ² (1 df)	—		P = —

Prediction

Q1. With n = 40 offspring, the χ² test against 3:1 will:
Almost always reject (sample is small but chi-squared is sensitive) Rarely reject — most samples land near 3:1 Depends entirely on which seed

Try at least 5 (n, seed) combinations to unlock Stage B. 0/5 combos

Controls

n offspring40

seed42

R code — monohybrid cross

set.seed(42)n <- 40# Each offspring is dominant with prob 3/4 (Aa × Aa)offspring <- rbinom(1, n, 0.75)  # count dominant phenotypeobs <- c(dominant = offspring, recessive = n - offspring)exp <- c(dominant = n * 0.75, recessive = n * 0.25)chisq.test(obs, p = c(0.75, 0.25))

B — Two genes, independent assortment, a 9:3:3:1

Add a second locus. Four categories. Watch the chi-squared get pickier as n falls.

Scenario

Two heterozygous loci, segregating independently. The four phenotype categories are expected at 9:3:3:1 — round-yellow : round-green : wrinkled-yellow : wrinkled-green. Same chi-squared machinery as Stage A, three degrees of freedom instead of one.

Observed counts vs expected 9:3:3:1

Phenotype	Observed	Expected	(O−E)²/E
Round, yellow	—	—	—
Round, green	—	—	—
Wrinkled, yellow	—	—	—
Wrinkled, green	—	—	—
χ² (3 df)	—		P = —

Prediction

Q1. With n = 50 offspring across four categories, how often will the χ² reject 9:3:3:1 if the cross really is dihybrid?
About 5% of the time (that's what α = 0.05 means) About 50% of the time (small samples reject everything) Almost never (the test is robust)

Try at least 5 (n, seed) combinations to unlock Stage C. 0/5 combos

Controls

n offspring80

seed42

R code — dihybrid cross

set.seed(42)n <- 80p <- c(9, 3, 3, 1) / 16          # 9:3:3:1 expectedobs <- rmultinom(1, n, p)[,1]chisq.test(obs, p = p)

C — When the chi-squared rejects, and when it shouldn't

Run many simulated 3:1 crosses. Plot the distribution of χ² statistics. The P-value from a single experiment is the tail fraction of this distribution.

Scenario

Generate 1,000 simulated monohybrid crosses, each at n offspring. Compute χ² for each. The distribution is the χ² null. The 5% tail is the rejection region. Your one observed experiment is one draw from this distribution.

The null hypothesis is set up to be wrong. It is an incorrect hypothesis. The question is, how incorrect is it? Which means that for any null-hypothesis test, if your sample size is sufficiently large, you will reject the null — because the null is set up to be wrong. — 145_lec01_07

χ² null distribution under perfect 3:1

n / cross: 80 | replicates: 1000 | % χ² > 3.84: — (theoretical: 5.0%)

Prediction

Q1. Across 1,000 simulated 3:1 crosses, the fraction with χ² > 3.84 (the 0.05 critical value) will be:
About 5% About 10% Effectively 0% if the cross is truly 3:1

Try at least 3 different n values to unlock Stage D. 0/3 values

Controls

n / cross80

seed42

R code — χ² null distribution

set.seed(42)n <- 80reps <- 1000chi2 <- replicate(reps, {  d <- rbinom(1, n, 0.75)  o <- c(d, n - d); e <- c(0.75*n, 0.25*n)  sum((o - e)^2 / e)})mean(chi2 > 3.84)            # empirical rejection rate

D — Mendel's actual data — and Fisher's complaint

Mendel's published pea ratios fit 3:1 unusually well. So well, in fact, that the combined χ² across his crosses is in the lower 1% tail. Either his lab tech was lying, or he tossed crosses that didn't look right, or the universe was unusually kind. You decide.

Scenario

Mendel's 1865 paper reports several monohybrid F2 ratios — round/wrinkled seed, yellow/green seed, etc. Load his counts. Compute χ² for each trait. Then combine across traits. Where does Mendel's combined χ² sit in the distribution of 1,000 simulated Mendel-style experimenters?

Mendel's combined χ² vs simulated honest experimenters

Mendel's combined χ²: — | P(χ² ≤ Mendel) under honest 3:1: —

Per-trait breakdown

Trait	n	Dominant	Recessive	χ² (1 df)	P

Prediction

Q1. Combined across his published F2 ratios, Mendel's χ² sits where in the distribution of honest experimenters' χ² totals?
In the upper tail — Mendel's ratios are unusually far from 3:1 In the middle — exactly what an honest experimenter would produce In the lower tail — Mendel's ratios fit 3:1 unusually well

Run the simulated experimenter test at least 2 times to wrap up. 0/2 runs

Controls

seed42

R code — Mendel vs honest experimenters

peas <- read.csv("data/clean/mendel_pea.csv")# Per-trait chi-squaredchi_obs <- apply(peas, 1, function(r) {  o <- c(r["dominant"], r["recessive"])  n <- sum(o); e <- c(0.75*n, 0.25*n)  sum((o - e)^2 / e)})total_obs <- sum(chi_obs)# Simulate 1000 honest experimentersset.seed(42)null_totals <- replicate(1000, sum(apply(peas, 1, function(r) {  n <- r["dominant"] + r["recessive"]  d <- rbinom(1, n, 0.75)  o <- c(d, n - d); e <- c(0.75*n, 0.25*n)  sum((o - e)^2 / e)})))mean(null_totals <= total_obs)   # Fisher's complaint