Lesson 3 — Mendel, linkage, and gene dropping

BIO 202, Spring 2026 — draft v1. Mendelian transmission is fair coins. Fair coins make distributions, not exact ratios. Everything popgen will do from here — drift, linkage decay, founder loss — starts here.

What Lesson 3 is actually asking

Three scales of Mendelian sampling. One locus, two loci, and a whole pedigree — each one asks how far random transmission pushes allele frequencies away from their expected values.

The math is not harder than lesson 0. The shift is that the "noise" now has a biological name: meiosis.

A — Mendel's laws are fair coins. A 3:1 ratio is an expectation, not a promise.

Two heterozygous parents (Aa × Aa). Each child gets one allele from each parent by a fair coin. The expected frequency of A in the kids is 0.5. In a family of 4, the actual frequency lands on 0, 0.25, 0.5, 0.75, or 1 — with probabilities 1/16, 4/16, 6/16, 4/16, 1/16.

For this stage: read the spread of across small families as the same thing you called "sampling variance" in Lesson 0. The drift variance p(1−p)/(2n) in popgen is a Mendelian restatement of the binomial variance.

Complete the previous stage to unlock this section.

Scenario

Two parents, both Aa. Each produces gametes: half A, half a. A child receives one gamete from each parent, independently. Four possible combinations per child, each with probability 1/4: AA, Aa, Aa, aa. The Punnett square.

p(A in one gamete) = 0.5   p(A in child) = 0.5

In a family of n children, count the number of A alleles across all 2n gametes the parents transmit. That count is Binomial(2n, 0.5). Divide by 2n to get the family's allele frequency p̂:

2n · p̂ ~ Binomial(2n, 0.5)   Var(p̂) = 0.5 · 0.5 / (2n) = p(1−p)/(2n)

That second formula is the one popgen texts call the "drift variance" per generation. It is not a new idea. It is the variance of a binomial proportion, applied to alleles instead of coin flips.

Simulate many families of size n. The expected allele frequency in every one is 0.5. The realized allele frequency in each one is somewhere in that binomial distribution. Small n, wide distribution — a lot of families land at 0.25 or 0.75 by luck. Large n, narrow distribution — almost every family lands near 0.5.

Note that "the 3:1 ratio" taught in an intro genetics course is the expected AA+Aa : aa split across infinitely many children. For a family of 4, the ratio is fixed to 4:0, 3:1, 2:2, 1:3, or 0:4 — and each of those has nonzero probability.

Allele frequency p̂ across families (histogram)

n (family size):  |  families:  |  mean p̂:  |  sd(p̂):
theory: sd(p̂) = √(p(1−p)/(2n)) =

Prediction (required before sliders unlock)

  1. Q1. You simulate 1000 families of 4 children from Aa × Aa. What will the histogram of family-level p̂ look like?
  2. Q2. You raise the family size n from 4 to 40, keeping the number of families fixed. The histogram of p̂…
Explore the sliders to unlock Stage B. 0/5 moves

Transfer question — new scenario

A pair of carriers for a recessive disease (Aa × Aa) has three children. Zero of them are affected (aa). Which reading is best?

Controls

4
1000
42

R code (base R)

# Stage A: one locus, Aa x Aa, family size n.set.seed(42)n     <- 4     # kids per familyfams  <- 1000  # number of families to simulate# Each kid gets one allele from each of two Aa parents.# A = 1, a = 0. Count A alleles in each family, divide by 2n.p_hat <- replicate(fams, {  g1 <- rbinom(n, 1, 0.5)   # allele from parent 1, per kid  g2 <- rbinom(n, 1, 0.5)   # allele from parent 2, per kid  sum(g1 + g2) / (2 * n)})# realized vs expectedmean(p_hat); sd(p_hat)sqrt(0.5 * 0.5 / (2 * n))   # theoretical sd of p_hathist(p_hat, breaks = seq(-0.025, 1.025, by = 0.05),     col = "gray80", border = "white",     xlab = "realized p in family", main = "")abline(v = 0.5, col = "#b23a48", lwd = 2, lty = 2)

B — Two loci, one recombination rate. How tightly do A and B travel together?

A parent's two chromosomes carry AB on one and ab on the other. A gamete is either a parental haplotype (AB, ab) or a recombinant one (Ab, aB). r is the probability of a recombinant gamete. r = 0 means full linkage; r = 0.5 means the two loci are effectively independent.

For this stage: read linkage disequilibrium D = p(AB)·p(ab) − p(Ab)·p(aB) as "how much more often do A and B show up together than they would if gametes were independent draws at the two loci?" r shrinks D toward 0 across generations.

Complete Stage A to unlock this section.

Scenario

One double-heterozygous parent, coupling phase: AB on one chromosome, ab on the other. In meiosis a crossover between the two loci swaps the tail, and the parent produces a recombinant gamete (Ab or aB) instead of a parental one (AB or ab). r is the probability that a given gamete is recombinant:

p(AB) = p(ab) = (1 − r)/2   p(Ab) = p(aB) = r/2

At r = 0, the two loci are locked: only AB and ab gametes, 50/50. A and B always travel together. At r = 0.5, all four haplotypes are equally frequent (1/4 each). A and B are effectively independent — the probability of a B allele is 1/2 regardless of whether the A locus shows A or a.

Linkage disequilibrium D measures the departure from independence at the two loci:

D = p(AB) · p(ab) − p(Ab) · p(aB)

If gametes were independent draws at the two loci, D would be 0 (p(AB) would equal p(A)·p(B)). When A and B travel together, D is positive; when they avoid each other, D is negative. The starting AB/ab parent has maximum coupling-phase D: D0 = 1/4. Recombination erodes D by a factor of (1 − r) every generation.

Note that r is bounded at 0.5. You cannot have more than 50% recombinant gametes from a coupling-phase parent, because a gamete with zero crossovers between the two loci is parental by definition, and a gamete with many crossovers is recombinant with probability 1/2. r = 0.5 is "effectively on different chromosomes".

The trick is that D = 0 at r = 0.5 because the four haplotypes all come out at 1/4. Plug 1/4 into the formula: (1/4)(1/4) − (1/4)(1/4) = 0. Independence is the same statement as "no D".

Gamete haplotype frequencies from AB/ab parent

p̂(AB):  |  p̂(Ab):  |  p̂(aB):  |  p̂(ab):
observed D:  |  expected D:  |  r:

Prediction (required before sliders unlock)

  1. Q1. You set r = 0.5 (the two loci are on different chromosomes, effectively). The four haplotype frequencies in the gamete pool will be…
  2. Q2. You start at r = 0 and slowly raise r toward 0.5. The observed D…
Explore the sliders to unlock Stage C. 0/5 moves

Transfer question — new scenario

A study of a human population finds two SNPs where the "risk" alleles co-occur on the same haplotype much more often than chance — observed D is about 80% of its maximum possible value. The two SNPs are 50 kb apart. Which reading is best?

Controls

0.10
2000
42

R code (base R)

# Stage B: AB/ab parent, recombination rate r.set.seed(42)r    <- 0.10    # recombination rate between the two locingam <- 2000    # gametes to simulate# Each gamete: is it recombinant? (prob r) and which of the two possibilities?recomb <- rbinom(ngam, 1, r)flip   <- rbinom(ngam, 1, 0.5)# Parental gametes are AB ("AB") or ab ("ab"). Recombinant gametes are Ab or aB.hap <- ifelse(recomb == 0,              ifelse(flip == 0, "AB", "ab"),              ifelse(flip == 0, "Ab", "aB"))tbl <- table(factor(hap, levels = c("AB", "Ab", "aB", "ab"))) / ngam# Linkage disequilibrium at the gamete poolD <- tbl["AB"] * tbl["ab"] - tbl["Ab"] * tbl["aB"]print(tbl); print(D)barplot(tbl, col = c("#66a61e", "#7570b3", "#e6ab02", "#a6761d"),        ylim = c(0, 0.55), ylab = "gamete frequency")abline(h = c((1-r)/2, r/2), lty = 2, col = "#b23a48")

C — Gene dropping. How many founder alleles make it to generation F2?

Four unrelated founders, each homozygous for a unique allele (1, 2, 3, 4). One generation of crosses gives two F1 heterozygotes. The F1 couple produces k F2 offspring. How many of the four founder alleles survive to F2, and how does that depend on k?

For this stage: name "gene dropping" as the technique conservation geneticists use to predict how much founder diversity a captive breeding plan will retain. Read founder loss as fair-coin attrition: each F1 parent transmits one of its two alleles per child, so an allele that fails to be transmitted in any of k kids is gone.

Complete Stage B to unlock this section.

Scenario — a four-founder pedigree

Four unrelated founders, drawn from a captive breeding program. Each is homozygous for a unique allele — easy bookkeeping, nothing else:

G1 = (1, 1)   G2 = (2, 2)   G3 = (3, 3)   G4 = (4, 4)

Two couples, two F1 offspring:

G1 × G2  →  F1a = (1, 2)     G3 × G4  →  F1b = (3, 4)

F1a × F1b produce k children. For each child, F1a transmits either allele 1 or allele 2 (fair coin), and F1b transmits either allele 3 or allele 4 (fair coin). The child is one of (1,3), (1,4), (2,3), (2,4) — each with probability 1/4.

An allele is retained to F2 if it appears in at least one of the k kids. An allele is lost if every one of the k F1 transmissions from its carrier parent went the other way. For allele 1:

P(allele 1 lost) = (1/2)k   P(allele 1 retained) = 1 − (1/2)k

Same calculation for alleles 2, 3, 4. So the expected number of founder alleles retained, summed over all four, is 4 · (1 − (1/2)k). k = 1: expect 2. k = 4: expect 3.75. k = 8: expect 3.97.

Note that allele 1 and allele 2 are not independently retained: if F1a transmits allele 1 in every kid (loss of allele 2), then allele 1 is certainly kept. Formal calc: P(F1a loses either 1 or 2) = 2 · (1/2)k − 0 = 21−k. For k = 1 this is 1 — you cannot keep both of F1a's alleles in a single child.

Why this matters. This is the engine behind captive-breeding family-size recommendations. A program that produces one offspring per generation per breeding pair loses on average half of each pair's genetic contribution per generation. The Florida panther, the California condor, and Przewalski's horse programs all run gene-dropping simulations on their real pedigrees to plan pairings. This stage is the toy version.

Founder-allele retention probability across replicates

k (F2 kids):  |  replicates:  |  mean # retained:  |  expected:
P(all 4 retained):  |  theory:   (theory = (1 − 21−k)² for k ≥ 1)

Retention probability vs family size k

dashed red: theoretical 1 − (1/2)k  |  blue dots: simulated P(one founder allele retained) per k

Prediction (required before sliders unlock)

  1. Q1. With k = 1 F2 child, how many of the 4 founder alleles are retained?
  2. Q2. You raise k from 1 to 6, keeping the pedigree and the seed fixed. P(all 4 founder alleles retained)…
Explore the sliders to finish Lesson 3. 0/5 moves

Transfer question — new scenario

A captive-breeding coordinator tells you: "This founder pair has only produced one offspring in ten years. We need to rebreed them before we lose their genetic contribution." Which reading makes sense?

Controls

2
1000
42

R code (base R) — gene-drop one founder pedigree

# Stage C: drop alleles through a 2-couple, 1-generation pedigree.set.seed(42)k    <- 2      # F2 kids per couplereps <- 1000   # replicate simulations of the pedigree# Founders: G1=(1,1), G2=(2,2), G3=(3,3), G4=(4,4)# F1a from G1xG2 is (1,2); F1b from G3xG4 is (3,4).# Each F2 kid: one allele from F1a (1 or 2), one from F1b (3 or 4).gene_drop <- function(k) {  from_a <- sample(c(1, 2), k, replace = TRUE)  from_b <- sample(c(3, 4), k, replace = TRUE)  alleles_seen <- unique(c(from_a, from_b))  as.integer(c(1, 2, 3, 4) %in% alleles_seen)}ret <- t(replicate(reps, gene_drop(k)))colnames(ret) <- paste0("a", 1:4)colMeans(ret)                   # per-allele retention probabilitymean(rowSums(ret) == 4)         # P(all four retained)barplot(colMeans(ret), ylim = c(0, 1),        col = c("#66a61e", "#7570b3", "#e6ab02", "#a6761d"),        ylab = "P(founder allele retained)")abline(h = 1 - (1/2)^k, col = "#b23a48", lwd = 2, lty = 2)