BIO 202 — Lesson 14: Counting heterozygote deficits

A — Panmictic population — F fluctuates around 0

Sample N individuals from an HWE population. Compute F = 1 − H_obs/H_exp. F is centered on 0 with sampling noise.

Scenario

True allele frequency p. Sample N individuals at HWE genotype probabilities (p², 2pq, q²). Compute F. Across 1000 replicate samples, F is approximately centered on 0 with variance that shrinks as N grows.

F across 1000 panmictic samples

N: 100 | p: 0.50 | mean F: — | SD F: —

Prediction

Q1. In a panmictic population, sampled N = 100 individuals, the standard deviation of F across replicate samples is closest to:
0 (F should be exactly 0 under panmixia) ~0.07 (~1/√(2N)) ~0.3 ~1

Try at least 4 (N, p) combos. 0/4 combos

Controls

N100

true p0.50

seed42

R code — sampling F under panmixia

set.seed(42)N <- 100; p <- 0.5; reps <- 1000Fs <- replicate(reps, {  obs <- rmultinom(1, N, c(p^2, 2*p*(1-p), (1-p)^2))[,1]  phat <- (2*obs[1]+obs[2])/(2*N)  Hexp <- 2*phat*(1-phat); Hobs <- obs[2]/N  1 - Hobs/Hexp})mean(Fs); sd(Fs)

B — Split the population — F climbs

Two subpopulations with different allele frequencies. Sample from both as if they were one. The pooled sample has a heterozygote deficit even though each subpop is at HWE. This is the Wahlund effect.

Scenario

Subpop 1 at p₁ = 0.7. Subpop 2 at p₂ = 0.3. Both at HWE within themselves. Pool a mixture (fraction w from 1, 1−w from 2). The pooled F equals var(p)/(p̄(1−p̄)) — the variance of subpop frequencies, scaled.

F as a function of the difference between subpopulations

p₁: 0.70 | p₂: 0.30 | F (predicted): — | F (observed): —

Prediction

Q1. With p₁ = 0.7, p₂ = 0.3, and 50:50 mixing, the pooled F is approximately:
0 (each subpop is at HWE) 0.04 0.16 (var(p)/(p̄(1−p̄)) = 0.04/0.25) 1

Try at least 4 (p₁−p₂, mix) combos. 0/4 combos

Controls

p₁0.70

p₂0.30

w (frac from pop 1)0.50

N pooled sample500

seed42

R code — Wahlund effect

set.seed(42)p1 <- 0.7; p2 <- 0.3; w <- 0.5; N <- 500N1 <- round(N*w); N2 <- N - N1o1 <- rmultinom(1,N1,c(p1^2,2*p1*(1-p1),(1-p1)^2))[,1]o2 <- rmultinom(1,N2,c(p2^2,2*p2*(1-p2),(1-p2)^2))[,1]obs <- o1 + o2phat <- (2*obs[1]+obs[2])/(2*N)1 - (obs[2]/N) / (2*phat*(1-phat))

C — F_IS, F_ST, F_IT — three measurements, three sources

F_IS measures within-deme inbreeding. F_ST measures between-deme differentiation. F_IT combines them: (1−F_IT) = (1−F_IS)(1−F_ST). Decomposing the total heterozygote deficit into its components.

Scenario

Multiple demes, each with its own allele frequency and its own internal inbreeding coefficient. Compute F_IS (avg within-deme), F_ST (between-deme variance / total variance), and F_IT (overall heterozygote deficit). The identity: (1−F_IT) = (1−F_IS)(1−F_ST).

F_IS, F_ST, F_IT bar chart

F_IS: — | F_ST: — | F_IT: — | (1−F_IS)(1−F_ST): —

Prediction

Q1. With F_IS = 0.1 (mild inbreeding within demes) and F_ST = 0.2 (moderate differentiation between demes), F_IT will be approximately:
0.3 (additive) 0.28 (1 − 0.9·0.8) 0.5 0 (the two cancel)

Try at least 3 (F_IS, F_ST) combos. 0/3 combos

Controls

F_IS0.10

number of demes5

spread (var of p across demes)0.050

seed42

R code — F decomposition

set.seed(42)demes <- 5; F_IS <- 0.1ps <- pmin(pmax(rnorm(demes, 0.5, sqrt(0.05)), 0.05), 0.95)pbar <- mean(ps); FST <- var(ps)/(pbar*(1-pbar))FIT <- 1 - (1-F_IS)*(1-FST)c(FIS = F_IS, FST = FST, FIT = FIT)

What F_ST is, in one picture

The bar chart above shows F_IS, F_ST, and F_IT as three numbers. They all come from one underlying quantity: the variance of allele frequencies across individuals.

Drag the slider below from pooled to split. The single bar of total variance peels apart into within-deme variance and between-deme variance. F_ST is just the between-piece, divided by the total. Every variance-decomposition idea in the rest of the course (heritability, kin selection, Price equation in L30) is a version of this picture.

Total allele-frequency variance, decomposing

total var: — | within: — | between: — | F_ST = between/total: —

pooled ↔ split0.00

When the slider is at 0, you see the variance the way you'd compute it pooling all individuals. As you drag toward 1, the same variance is re-attributed to deme membership: the dark portion is what would still be there if all demes had the same mean allele frequency (within-deme variance); the light portion is the part that exists only because demes differ from each other (between-deme variance).

D — Florida Scrub Jay — per-locus F

FSJ individual genotypes from data/clean/fsj_individuals.csv. Per-locus F. Find the outliers — they tell you something is happening beyond drift.

Scenario

For each locus in the FSJ dataset, compute F = 1 − H_obs/H_exp. Across many loci, F is centered on the typical drift baseline. Outliers (loci with unusually high F) are candidates for population structure or recent inbreeding.

Distribution of per-locus F values

loci: — | mean F: — | 95th percentile F: — | outlier loci (F > 0.2): —

Prediction

Q1. The fraction of FSJ loci with F > 0.2 will be:
Essentially 0 A few percent — outliers exist Most loci (the population is very inbred)

Resample at least 2 times. 0/2 resamples

Controls

seed42

R code — per-locus F

fsj <- read.csv("data/clean/fsj_individuals.csv")Fs <- sapply(unique(fsj$locus), function(L) {  rows <- fsj[fsj$locus == L, ]; obs <- table(rows$genotype)  N <- sum(obs); phat <- (2*obs["AA"]+obs["Aa"])/(2*N)  1 - (obs["Aa"]/N) / (2*phat*(1-phat))})hist(Fs); mean(Fs > 0.2)

Lesson 14 — Counting heterozygote deficits

What you'll do

A — Panmictic population — F fluctuates around 0

Scenario

F across 1000 panmictic samples

Prediction

Controls

R code — sampling F under panmixia

B — Split the population — F climbs

Scenario

F as a function of the difference between subpopulations

Prediction

Controls

R code — Wahlund effect

C — F_IS, F_ST, F_IT — three measurements, three sources

Scenario

F_IS, F_ST, F_IT bar chart

Prediction

Controls

R code — F decomposition

What F_ST is, in one picture

Total allele-frequency variance, decomposing

D — Florida Scrub Jay — per-locus F

Scenario

Distribution of per-locus F values

Prediction

Controls

R code — per-locus F