BIO 202, Spring 2026, draft v1. F = 1 − H_obs / H_exp. The deeper the deficit, the more your "population" isn't one.
Build a panmictic baseline. Split the population. Watch F grow. Decompose F_IT into F_IS (within-deme) and F_ST (between-deme). Test on FSJ data.
Sample N individuals from an HWE population. Compute F = 1 − H_obs/H_exp. F is centered on 0 with sampling noise.
True allele frequency p. Sample N individuals at HWE genotype probabilities (p², 2pq, q²). Compute F. Across 1000 replicate samples, F is approximately centered on 0 with variance that shrinks as N grows.
set.seed(42)N <- 100; p <- 0.5; reps <- 1000Fs <- replicate(reps, { obs <- rmultinom(1, N, c(p^2, 2*p*(1-p), (1-p)^2))[,1] phat <- (2*obs[1]+obs[2])/(2*N) Hexp <- 2*phat*(1-phat); Hobs <- obs[2]/N 1 - Hobs/Hexp})mean(Fs); sd(Fs)
Two subpopulations with different allele frequencies. Sample from both as if they were one. The pooled sample has a heterozygote deficit even though each subpop is at HWE. This is the Wahlund effect.
Subpop 1 at p₁ = 0.7. Subpop 2 at p₂ = 0.3. Both at HWE within themselves. Pool a mixture (fraction w from 1, 1−w from 2). The pooled F equals var(p)/(p̄(1−p̄)) — the variance of subpop frequencies, scaled.
set.seed(42)p1 <- 0.7; p2 <- 0.3; w <- 0.5; N <- 500N1 <- round(N*w); N2 <- N - N1o1 <- rmultinom(1,N1,c(p1^2,2*p1*(1-p1),(1-p1)^2))[,1]o2 <- rmultinom(1,N2,c(p2^2,2*p2*(1-p2),(1-p2)^2))[,1]obs <- o1 + o2phat <- (2*obs[1]+obs[2])/(2*N)1 - (obs[2]/N) / (2*phat*(1-phat))
F_IS measures within-deme inbreeding. F_ST measures between-deme differentiation. F_IT combines them: (1−F_IT) = (1−F_IS)(1−F_ST). Decomposing the total heterozygote deficit into its components.
Multiple demes, each with its own allele frequency and its own internal inbreeding coefficient. Compute F_IS (avg within-deme), F_ST (between-deme variance / total variance), and F_IT (overall heterozygote deficit). The identity: (1−F_IT) = (1−F_IS)(1−F_ST).
set.seed(42)demes <- 5; F_IS <- 0.1ps <- pmin(pmax(rnorm(demes, 0.5, sqrt(0.05)), 0.05), 0.95)pbar <- mean(ps); FST <- var(ps)/(pbar*(1-pbar))FIT <- 1 - (1-F_IS)*(1-FST)c(FIS = F_IS, FST = FST, FIT = FIT)
The bar chart above shows F_IS, F_ST, and F_IT as three numbers. They all come from one underlying quantity: the variance of allele frequencies across individuals.
Drag the slider below from pooled to split. The single bar of total variance peels apart into within-deme variance and between-deme variance. F_ST is just the between-piece, divided by the total. Every variance-decomposition idea in the rest of the course (heritability, kin selection, Price equation in L30) is a version of this picture.
When the slider is at 0, you see the variance the way you'd compute it pooling all individuals. As you drag toward 1, the same variance is re-attributed to deme membership: the dark portion is what would still be there if all demes had the same mean allele frequency (within-deme variance); the light portion is the part that exists only because demes differ from each other (between-deme variance).
FSJ individual genotypes from data/clean/fsj_individuals.csv. Per-locus F. Find the outliers — they tell you something is happening beyond drift.
For each locus in the FSJ dataset, compute F = 1 − H_obs/H_exp. Across many loci, F is centered on the typical drift baseline. Outliers (loci with unusually high F) are candidates for population structure or recent inbreeding.
fsj <- read.csv("data/clean/fsj_individuals.csv")Fs <- sapply(unique(fsj$locus), function(L) { rows <- fsj[fsj$locus == L, ]; obs <- table(rows$genotype) N <- sum(obs); phat <- (2*obs["AA"]+obs["Aa"])/(2*N) 1 - (obs["Aa"]/N) / (2*phat*(1-phat))})hist(Fs); mean(Fs > 0.2)