Lesson 14 — Counting heterozygote deficits

BIO 202, Spring 2026, draft v1. F = 1 − H_obs / H_exp. The deeper the deficit, the more your "population" isn't one.

What you'll do

Build a panmictic baseline. Split the population. Watch F grow. Decompose F_IT into F_IS (within-deme) and F_ST (between-deme). Test on FSJ data.

An F of 1 — perfect inbreeding — means I don't have a population. With 50 AA, 50 aa, and zero heterozygotes, I have two populations. A big-A population and a little-a population that aren't interbreeding. Any value of F greater than zero is essentially saying my population is not one.— 202_lec10_07

A — Panmictic population — F fluctuates around 0

Sample N individuals from an HWE population. Compute F = 1 − H_obs/H_exp. F is centered on 0 with sampling noise.

Locked — confirm your name above to begin.

Scenario

True allele frequency p. Sample N individuals at HWE genotype probabilities (p², 2pq, q²). Compute F. Across 1000 replicate samples, F is approximately centered on 0 with variance that shrinks as N grows.

F across 1000 panmictic samples

N: 100  |  p: 0.50  |  mean F:  |  SD F:

Prediction

  1. Q1. In a panmictic population, sampled N = 100 individuals, the standard deviation of F across replicate samples is closest to:
Try at least 4 (N, p) combos. 0/4 combos

Controls

100
0.50
42

R code — sampling F under panmixia

set.seed(42)N <- 100; p <- 0.5; reps <- 1000Fs <- replicate(reps, {  obs <- rmultinom(1, N, c(p^2, 2*p*(1-p), (1-p)^2))[,1]  phat <- (2*obs[1]+obs[2])/(2*N)  Hexp <- 2*phat*(1-phat); Hobs <- obs[2]/N  1 - Hobs/Hexp})mean(Fs); sd(Fs)

B — Split the population — F climbs

Two subpopulations with different allele frequencies. Sample from both as if they were one. The pooled sample has a heterozygote deficit even though each subpop is at HWE. This is the Wahlund effect.

Complete Stage A.

Scenario

Subpop 1 at p₁ = 0.7. Subpop 2 at p₂ = 0.3. Both at HWE within themselves. Pool a mixture (fraction w from 1, 1−w from 2). The pooled F equals var(p)/(p̄(1−p̄)) — the variance of subpop frequencies, scaled.

F as a function of the difference between subpopulations

p₁: 0.70  |  p₂: 0.30  |  F (predicted):  |  F (observed):

Prediction

  1. Q1. With p₁ = 0.7, p₂ = 0.3, and 50:50 mixing, the pooled F is approximately:
Try at least 4 (p₁−p₂, mix) combos. 0/4 combos

Controls

0.70
0.30
0.50
500
42

R code — Wahlund effect

set.seed(42)p1 <- 0.7; p2 <- 0.3; w <- 0.5; N <- 500N1 <- round(N*w); N2 <- N - N1o1 <- rmultinom(1,N1,c(p1^2,2*p1*(1-p1),(1-p1)^2))[,1]o2 <- rmultinom(1,N2,c(p2^2,2*p2*(1-p2),(1-p2)^2))[,1]obs <- o1 + o2phat <- (2*obs[1]+obs[2])/(2*N)1 - (obs[2]/N) / (2*phat*(1-phat))

C — F_IS, F_ST, F_IT — three measurements, three sources

F_IS measures within-deme inbreeding. F_ST measures between-deme differentiation. F_IT combines them: (1−F_IT) = (1−F_IS)(1−F_ST). Decomposing the total heterozygote deficit into its components.

Complete Stage B.

Scenario

Multiple demes, each with its own allele frequency and its own internal inbreeding coefficient. Compute F_IS (avg within-deme), F_ST (between-deme variance / total variance), and F_IT (overall heterozygote deficit). The identity: (1−F_IT) = (1−F_IS)(1−F_ST).

F_IS, F_ST, F_IT bar chart

F_IS:  |  F_ST:  |  F_IT:  |  (1−F_IS)(1−F_ST):

Prediction

  1. Q1. With F_IS = 0.1 (mild inbreeding within demes) and F_ST = 0.2 (moderate differentiation between demes), F_IT will be approximately:
Try at least 3 (F_IS, F_ST) combos. 0/3 combos

Controls

0.10
5
0.050
42

R code — F decomposition

set.seed(42)demes <- 5; F_IS <- 0.1ps <- pmin(pmax(rnorm(demes, 0.5, sqrt(0.05)), 0.05), 0.95)pbar <- mean(ps); FST <- var(ps)/(pbar*(1-pbar))FIT <- 1 - (1-F_IS)*(1-FST)c(FIS = F_IS, FST = FST, FIT = FIT)

What F_ST is, in one picture

The bar chart above shows F_IS, F_ST, and F_IT as three numbers. They all come from one underlying quantity: the variance of allele frequencies across individuals.

Drag the slider below from pooled to split. The single bar of total variance peels apart into within-deme variance and between-deme variance. F_ST is just the between-piece, divided by the total. Every variance-decomposition idea in the rest of the course (heritability, kin selection, Price equation in L30) is a version of this picture.

Total allele-frequency variance, decomposing

total var:  |  within:  |  between:  |  F_ST = between/total:
0.00

When the slider is at 0, you see the variance the way you'd compute it pooling all individuals. As you drag toward 1, the same variance is re-attributed to deme membership: the dark portion is what would still be there if all demes had the same mean allele frequency (within-deme variance); the light portion is the part that exists only because demes differ from each other (between-deme variance).

D — Florida Scrub Jay — per-locus F

FSJ individual genotypes from data/clean/fsj_individuals.csv. Per-locus F. Find the outliers — they tell you something is happening beyond drift.

Complete Stage C.

Scenario

For each locus in the FSJ dataset, compute F = 1 − H_obs/H_exp. Across many loci, F is centered on the typical drift baseline. Outliers (loci with unusually high F) are candidates for population structure or recent inbreeding.

Distribution of per-locus F values

loci:  |  mean F:  |  95th percentile F:  |  outlier loci (F > 0.2):

Prediction

  1. Q1. The fraction of FSJ loci with F > 0.2 will be:
Resample at least 2 times. 0/2 resamples

Controls

42

R code — per-locus F

fsj <- read.csv("data/clean/fsj_individuals.csv")Fs <- sapply(unique(fsj$locus), function(L) {  rows <- fsj[fsj$locus == L, ]; obs <- table(rows$genotype)  N <- sum(obs); phat <- (2*obs["AA"]+obs["Aa"])/(2*N)  1 - (obs["Aa"]/N) / (2*phat*(1-phat))})hist(Fs); mean(Fs > 0.2)