BIO 202 — Lesson 13: Where deleterious alleles get held in place

What you'll do

Mutation-only model. Mutation + selection equilibrium at q ≈ μ/(hs). Back-calculate s. Cystic fibrosis as the case where μ/(hs) breaks.

The frequency of an allele in the population goes back to what frequencies always are: how often you get it yourself, plus how often you inherit it. The two ways to get it. The frequency of a bad mutation works out to μ/(H·S) — the mutation rate divided by how bad it is and how heritable that badness is.— 202_lec17_02

A — Mutation alone — q drifts up at rate μ

Without selection, a new mutation arises at rate μ per generation. In a finite population, it usually goes extinct from drift; rarely it sweeps to fixation.

Scenario

Start with q = 0. Each generation, each non-mutant allele can mutate to "a" with rate μ. Plot q over time. Without selection, this is a Wright-Fisher process plus an injection rate of μ per allele per generation.

q over time (mutation only)

μ: 1e-4 | N: 1000 | mean q at t=2000: —

Prediction

Q1. With mutation rate μ = 10⁻⁴ and no selection in N = 1000, after 2000 generations the equilibrium mean q across replicates will be:
Essentially 0 (drift wipes new mutations) About μ × t (mutations accumulate linearly) About 0.5 (drift averages) Mostly low but rising slowly (some mutants drift up)

Try at least 4 (μ, N) combos. 0/4 combos

Controls

μ (mutation rate)1e-4

N1000

seed42

R code — mutation only

set.seed(42)mu <- 1e-4N <- 1000q <- 0; traj <- qfor (g in 1:2000) {  q_mut <- (1-q)*mu + q*(1-mu)   # symmetric for simplicity  q <- rbinom(1,2*N,q_mut)/(2*N)  traj <- c(traj, q)}

B — Mutation + selection — equilibrium at μ/(hs)

Add selection: aa has fitness 1 − s; Aa has fitness 1 − hs (h is dominance). The deleterious allele settles at q ≈ √(μ/s) for recessive lethals, q ≈ μ/(hs) for dominant.

Scenario

q stabilizes when mutation influx (μ × frequency of A) equals selection efflux (≈ hs × 2pq for partially dominant, ≈ s × q² for recessive lethal). For partially dominant (h > 0): q ≈ μ/(hs). For recessive (h = 0): q ≈ √(μ/s).

q trajectory and theoretical equilibrium

μ: 1e-5 | s: 0.1 | h: 0.5 | theory q̂: — | observed q (gen 5000): —

Prediction

Q1. For h = 0.5, μ = 10⁻⁵, s = 0.1, the equilibrium frequency q̂ will be approximately:
10⁻⁵ / 0.1 ≈ 10⁻⁴ 2 × 10⁻⁴ (μ/(hs) = 10⁻⁵/0.05) Effectively 0 — selection wins Around 10⁻² — drift dominates

Try at least 4 (μ, s, h) combos. 0/4 combos

Controls

μ (log₁₀)1e-5

s0.100

h (dominance)0.50

R code — mutation-selection balance

mu <- 1e-5; s <- 0.1; h <- 0.5q <- 0; gens <- 5000; traj <- numeric(gens+1)for (g in 1:gens) {  p <- 1-q; w_AA <- 1; w_Aa <- 1-h*s; w_aa <- 1-s  wbar <- p^2*w_AA + 2*p*q*w_Aa + q^2*w_aa  q <- (p*q*w_Aa + q^2*w_aa) / wbar  q <- q*(1-mu) + (1-q)*mu  traj[g+1] <- q}tail(traj, 1)

C — Estimate s from observed q

If you can measure q (population frequency of the allele), μ (per-generation rate), and h (dominance), you can solve for s: s ≈ μ/(h·q). The pink-katydid back-calculation.

Scenario

Pink katydid example. Population frequency q ≈ 5 × 10⁻⁴. Mutation rate μ ≈ 5 × 10⁻⁵. Dominance h ≈ 1 (visible heterozygote). Solve: s ≈ μ/(h·q) ≈ 0.1. The selection coefficient is 10% — a huge effect.

Inferred s from (μ, h, q)

μ: 5e-5 | h: 1.0 | observed q: 5e-4 | inferred s: —

Prediction

Q1. The pink katydid sits at q ≈ 5 × 10⁻⁴ with μ ≈ 5 × 10⁻⁵. The inferred selection coefficient is closest to:
~0.1 (very strong) ~0.001 (weak) 0 (neutral) ~1 (lethal)

Try at least 3 (q, μ) combos. 0/3 combos

Controls

μ (log₁₀)5e-5

h1.00

observed q (log₁₀)5e-4

R code — back-calculate s

mu <- 5e-5; h <- 1; q <- 5e-4# Mutation-selection balance: q = mu/(h*s) for h > 0s_inferred <- mu / (h * q)s_inferred

D — Cystic fibrosis — when the formula breaks

CF allele frequency in Northern European populations is ~0.022. If h = 0 and s = 1 (recessive lethal), μ/(hs) doesn't apply; the recessive formula gives q ≈ √(μ/s). With μ ≈ 10⁻⁶ and s ≈ 1, q ≈ 10⁻³. Observed is 22× higher. Something else.

Scenario

Test three models against q_observed = 0.022: (1) recessive-lethal mutation-selection balance q = √(μ/s); (2) partially dominant model q = μ/(hs); (3) heterozygote advantage at AA fitness 1−sₐ, Aa fitness 1, aa fitness 1−s_aa. Only the third reaches the observed q.

Predicted q under three models

observed q: 0.022 | recessive q (√(μ/s)): — | partial dominance q: — | het advantage q: —

Prediction

The basic mutation-selection balance model assumes two causal arrows: μ → q and s_aa → q. It predicts q ≈ 0.001. The observed q is 22× higher. Which additional arrows in the causal model could be doing the work? Check every plausible one.

Q1. What's keeping CF at q ≈ 0.022 above mutation-selection balance? (Check any that you think contribute.)

Carriers (Aa) have a fitness advantage (heterozygote advantage) The CFTR mutation rate at this locus is much higher than the 1×10⁻⁶ assumed A historical bottleneck / founder event drifted CF up; selection hasn't yet brought it down Pure ongoing drift The observed q ≈ 0.022 is a measurement error

Adjust the het-advantage slider and find a configuration that matches 0.022. 0/2 attempts

Controls

μ (log₁₀)1e-6

s_aa (cost of homozygous CF)1.00

s_AA (heterozygote advantage cost)0.020

R code — three CF models

# Recessive lethal balancemu <- 1e-6; s_aa <- 1; q_rec <- sqrt(mu/s_aa)# Partial dominance balanceh <- 0.1; q_pd <- mu/(h*s_aa)# Heterozygote advantage equilibrium: q_eq = s_AA / (s_AA + s_aa)s_AA <- 0.02; q_ha <- s_AA/(s_AA + s_aa)c(rec = q_rec, partial = q_pd, hetadv = q_ha, observed = 0.022)