BIO 202, Spring 2026, draft v1. Selection coefficients, deterministic trajectories, and the cloud of drift around them.
Drive a beneficial allele to fixation deterministically. Add drift. Estimate s from observed Δp. End at the LTEE.
Infinite population, no drift. A favorable allele with coefficient s sweeps to fixation along a logistic curve.
Allele A has fitness 1 + s relative to a; AA / Aa / aa get genotype fitnesses (1+s)², (1+s), 1 in the haploid approximation. The frequency p evolves: p' = p(1+s)/(p(1+s)+(1−p)). Plot trajectory from p₀.
s <- 0.05p <- 0.01gens <- 500; traj <- numeric(gens+1); traj[1] <- pfor (g in 1:gens) { p <- p*(1+s)/(p*(1+s)+(1-p)); traj[g+1] <- p }
Real populations are finite. Same selection coefficient, finite N. Some replicates sweep; some lose the beneficial allele to drift anyway.
50 replicate populations of N = 100. Same s = 0.05. Same p₀ = 0.05. Watch the spread. Some fix the allele; some lose it. Probability of fixation for a single new beneficial allele is ≈ 2s (Haldane).
set.seed(42)N <- 100; s <- 0.05; p0 <- 0.05; reps <- 50trajs <- replicate(reps, { p <- p0; traj <- p for (g in 1:300) { p_sel <- p*(1+s)/(p*(1+s)+(1-p)) p <- rbinom(1,2*N,p_sel)/(2*N); traj <- c(traj, p) }; traj})
Given allele frequency at two time points, infer s. Same likelihood as fitting a logistic curve. Wide CI when Δp is small relative to drift noise.
Simulate a sweep at known true s. Observe p at t = 0 and t = T. Fit s by least squares. Confidence interval depends on N (drift noise), T (signal time), and the magnitude of s.
set.seed(42)N <- 200; s_true <- 0.05; T <- 50; p0 <- 0.05p <- p0; traj <- pfor (g in 1:T) { p_sel <- p*(1+s_true)/(p*(1+s_true)+(1-p)); p <- rbinom(1,2*N,p_sel)/(2*N); traj <- c(traj, p) }s_grid <- seq(-0.05, 0.2, 0.001)ssr <- sapply(s_grid, function(s) { p2 <- p0; pp <- p2; for (g in 1:T) { pp <- pp*(1+s)/(pp*(1+s)+(1-pp)); p2 <- c(p2, pp) }; sum((traj - p2)^2) })s_grid[which.min(ssr)]
Real allele-frequency time courses from data/clean/ltee_allele_freqs.csv. Fit s to each rising allele. Some sweep cleanly; some get displaced.
LTEE allele frequencies plotted over generations. Each rising trajectory is a beneficial mutation. Fit s per trajectory. Watch for clonal interference — alleles that rise then crash as a fitter mutation appears in the population.
ltee <- read.csv("data/clean/ltee_allele_freqs.csv")# For each mutation, fit s by least squares to logisticmuts <- split(ltee, ltee$mutation_id)fits <- sapply(muts, function(m) { t <- m$generation - min(m$generation); pobs <- m$freq; p0 <- pobs[1] s_grid <- seq(0, 0.3, 0.005) ssr <- sapply(s_grid, function(s) sum((pobs - 1/(1 + (1/p0 - 1)*exp(-s*t)))^2)) s_grid[which.min(ssr)]})
Same Wright-Fisher + selection simulator. Each round shows one trajectory. You decide which forces produced it before the truth is revealed.
Five trajectories, generated under known (N, s, p₀) combinations. Some are drift-only (s = 0). Some are selection-only-feeling (s much larger than 1/N). Some are the boundary case (s ≈ 1/N) where you can't tell from a single trace.
For each trajectory, pick the best label. The next round won't unlock until you commit.
The boundary |s| ≈ 1/N is the most important single number in this lesson and you'll rely on it through Units 3 and 4. Below the boundary, selection is invisible against drift; above it, selection dominates. This drill makes you confront the boundary five times, on trajectories with known truth.