BIO 202 — Lesson 12: Pushing the allele frequency with selection

What you'll do

Drive a beneficial allele to fixation deterministically. Add drift. Estimate s from observed Δp. End at the LTEE.

Pine trees — do they want their seeds to be eaten? No. The trees that made pinecone seeds that were easy to eat — what happened to them? They died. The trees that survived are the ones that had pinecone seeds that weren't easy to eat. That's more or less all selection is doing — the ones that aren't good enough to make more of themselves don't make more of themselves.— 202_lec16_03

A — Deterministic selection — the s-curve

Infinite population, no drift. A favorable allele with coefficient s sweeps to fixation along a logistic curve.

Scenario

Allele A has fitness 1 + s relative to a; AA / Aa / aa get genotype fitnesses (1+s)², (1+s), 1 in the haploid approximation. The frequency p evolves: p' = p(1+s)/(p(1+s)+(1−p)). Plot trajectory from p₀.

Frequency trajectory under selection

s: 0.05 | p₀: 0.01 | time to p=0.5: — gen | time to p=0.99: — gen

Prediction

Q1. Halving s (from 0.05 to 0.025) changes the time to fixation by approximately:
Half Double 4× Unchanged — sweeps are mostly about p₀

Try at least 4 (s, p₀) combos. 0/4 combos

Controls

s0.050

p₀0.010

R code — deterministic selection

s <- 0.05p <- 0.01gens <- 500; traj <- numeric(gens+1); traj[1] <- pfor (g in 1:gens) { p <- p*(1+s)/(p*(1+s)+(1-p)); traj[g+1] <- p }

B — Selection + drift

Real populations are finite. Same selection coefficient, finite N. Some replicates sweep; some lose the beneficial allele to drift anyway.

Scenario

50 replicate populations of N = 100. Same s = 0.05. Same p₀ = 0.05. Watch the spread. Some fix the allele; some lose it. Probability of fixation for a single new beneficial allele is ≈ 2s (Haldane).

50 replicate trajectories

N: 100 | s: 0.05 | p₀: 0.05 | % replicates fixed: — | Haldane prediction (2s): 10%

Prediction

Q1. A new beneficial allele (1 copy, p₀ = 1/(2N)) with s = 0.05 in a population of N = 100 will fix about:
10% of the time 50% of the time 100% of the time (it's beneficial) Essentially 0% (it's just one copy)

Try at least 5 (N, s, p₀) combos. 0/5 combos

Controls

N100

s0.050

p₀0.050

seed42

R code — selection + drift

set.seed(42)N <- 100; s <- 0.05; p0 <- 0.05; reps <- 50trajs <- replicate(reps, {  p <- p0; traj <- p  for (g in 1:300) {    p_sel <- p*(1+s)/(p*(1+s)+(1-p))    p <- rbinom(1,2*N,p_sel)/(2*N); traj <- c(traj, p)  }; traj})

C — Estimate s from observed Δp

Given allele frequency at two time points, infer s. Same likelihood as fitting a logistic curve. Wide CI when Δp is small relative to drift noise.

Scenario

Simulate a sweep at known true s. Observe p at t = 0 and t = T. Fit s by least squares. Confidence interval depends on N (drift noise), T (signal time), and the magnitude of s.

SSR profile over candidate s

true s: 0.05 | best-fit s: — | 95% CI: —

Prediction

Q1. With N = 200, T = 50 generations, and true s = 0.02 (weak), the fit will:
Pin s to within ±0.005 Have a wide CI — drift dominates Fit s = 0 (drift indistinguishable from weak selection)

Try at least 3 (s, N) combos. 0/3 combos

Controls

true s0.050

N200

T (gens observed)50

seed42

R code — fit s

set.seed(42)N <- 200; s_true <- 0.05; T <- 50; p0 <- 0.05p <- p0; traj <- pfor (g in 1:T) { p_sel <- p*(1+s_true)/(p*(1+s_true)+(1-p)); p <- rbinom(1,2*N,p_sel)/(2*N); traj <- c(traj, p) }s_grid <- seq(-0.05, 0.2, 0.001)ssr <- sapply(s_grid, function(s) { p2 <- p0; pp <- p2; for (g in 1:T) { pp <- pp*(1+s)/(pp*(1+s)+(1-pp)); p2 <- c(p2, pp) }; sum((traj - p2)^2) })s_grid[which.min(ssr)]

D — Lenski's long-term E. coli experiment

Real allele-frequency time courses from data/clean/ltee_allele_freqs.csv. Fit s to each rising allele. Some sweep cleanly; some get displaced.

Scenario

LTEE allele frequencies plotted over generations. Each rising trajectory is a beneficial mutation. Fit s per trajectory. Watch for clonal interference — alleles that rise then crash as a fitter mutation appears in the population.

LTEE allele frequencies

trajectories shown: — | median fit s: — | range: —

Prediction

Q1. Across LTEE beneficial mutations that successfully fix, the typical selection coefficient is around:
0.001 — very weak 0.01 — small but measurable 0.05 — moderate 0.5 — large

Inspect at least 2 trajectories. 0/2 inspections

Controls

subset seed42

R code — fit s per LTEE trajectory

ltee <- read.csv("data/clean/ltee_allele_freqs.csv")# For each mutation, fit s by least squares to logisticmuts <- split(ltee, ltee$mutation_id)fits <- sapply(muts, function(m) {  t <- m$generation - min(m$generation); pobs <- m$freq; p0 <- pobs[1]  s_grid <- seq(0, 0.3, 0.005)  ssr <- sapply(s_grid, function(s) sum((pobs - 1/(1 + (1/p0 - 1)*exp(-s*t)))^2))  s_grid[which.min(ssr)]})

E — Drift, selection, or both? Five trajectories, five labels.

Same Wright-Fisher + selection simulator. Each round shows one trajectory. You decide which forces produced it before the truth is revealed.

Scenario

Five trajectories, generated under known (N, s, p₀) combinations. Some are drift-only (s = 0). Some are selection-only-feeling (s much larger than 1/N). Some are the boundary case (s ≈ 1/N) where you can't tell from a single trace.

For each trajectory, pick the best label. The next round won't unlock until you commit.

Round 1 of 5

progress: 0 / 5 | correct so far: 0

Classify this trajectory

Drift only (no selection — s ≈ 0) Selection dominates (s ≫ 1/N — the trajectory is mostly deterministic) Both — selection nudges, drift jitters (s comparable to 1/N)

Finish all 5 rounds. 0/5 rounds

Why this drill?

The boundary |s| ≈ 1/N is the most important single number in this lesson and you'll rely on it through Units 3 and 4. Below the boundary, selection is invisible against drift; above it, selection dominates. This drill makes you confront the boundary five times, on trajectories with known truth.