2014-03-01

Population Genetics Chpt 3 Box

Chapter 3 Box Problems

Box A

Problem a

$$\frac{1}{N} = \frac{\frac{1}{10^1}+\frac{1}{10^2}\frac{1}{10^3}\frac{1}{10^4}}{4}$$

So:
$$N = \frac{4}{\frac{1}{10^1}+\frac{1}{10^2}\frac{1}{10^3}\frac{1}{10^4}}$$

1
2
3

N <- 4/(1e-1 + 1e-2 + 1e-3 + 1e-4)
N
## [1] 36.0036

Problem b

Using $$\frac{4N_{m}N_{f}}{N_{m} + N_{f}}$$

N.m <- 4

#100 cows, 4 bulls

N.f <- 100

4*N.m*N.f/(N.m + N.f)
## [1] 15.38462

#200 cows, 4 bulls

N.f <- 200

4*N.m*N.f/(N.m + N.f)
## [1] 15.68627

## Box B ##

Problem a

$F_{ST}$ calculation

First set up the data table:

Group1 <- c(0.15, 0.20, 0.30, 0.35)
Group2 <- c(0.10, 0.15, 0.35, 0.40)
Group3 <- c(0.05, 0.10, 0.40, 0.45)
Group4 <- c(0.01, 0.05, 0.45, 0.49)
Group5 <- c(1.0, 0, 0, 0)

data <- data.frame( Group1, Group2, Group3, Group4, Group5)


#let's reshape the data so we can get a view of what we are analyzing
data2 <- reshape(data, varying=list(names(data)), v.names="p",  direction="long", timevar="Group")

plot( data2$Group, data2$p)

#We can see that with increasing group number the allele frequencies become more divergent.  
#Now the calculations:


#p.bar is the average allele frequency p
p.bar <- apply(data, 2, mean)
p.bar
## Group1 Group2 Group3 Group4 Group5 
##   0.25   0.25   0.25   0.25   0.25
#calculate the within subpopulation heterozygosity HS
#use the formula  HS = 2p*(1-p)
HS.data <- 2*data*(1-data)
HS.data
##   Group1 Group2 Group3 Group4 Group5
## 1  0.255  0.180  0.095 0.0198      0
## 2  0.320  0.255  0.180 0.0950      0
## 3  0.420  0.455  0.480 0.4950      0
## 4  0.455  0.480  0.495 0.4998      0
#calculate the average within subpopulation heterozygozity HS.bar

HS.bar <- apply(HS.data, 2, mean)
HS.bar
## Group1 Group2 Group3 Group4 Group5 
## 0.3625 0.3425 0.3125 0.2774 0.0000

#As the populations drift towards the extremes (p=0 or p=1) heterozygosity decreases.
#calculate the total heterozygosity in the group
#HT= 2*p.bar*(1-p.bar)

HT <- 2*p.bar*(1-p.bar)
HT
## Group1 Group2 Group3 Group4 Group5 
##  0.375  0.375  0.375  0.375  0.375
#calculate F.ST

F.ST <- (HT - HS.bar)/HT
F.ST
##     Group1     Group2     Group3     Group4     Group5 
## 0.03333333 0.08666667 0.16666667 0.26026667 1.00000000

#As the populations drift towards the extremes (p=0 or p=1) the fixation index increases.

#The second method of calculating F.ST using variance

average.of.square <- apply(data^2, 2, mean)
average.of.square
##  Group1  Group2  Group3  Group4  Group5 
## 0.06875 0.07875 0.09375 0.11130 0.25000
square.of.average <- p.bar^2
square.of.average
## Group1 Group2 Group3 Group4 Group5 
## 0.0625 0.0625 0.0625 0.0625 0.0625
variance <- average.of.square - square.of.average
variance
##  Group1  Group2  Group3  Group4  Group5 
## 0.00625 0.01625 0.03125 0.04880 0.18750
F.ST <- variance/(p.bar*(1-p.bar))
F.ST
##     Group1     Group2     Group3     Group4     Group5 
## 0.03333333 0.08666667 0.16666667 0.26026667 1.00000000

Box C

Problem a

Given:

$$2d^2 = F_{ST} = \frac{\sigma^2}{\overline{p}\overline{q}}$$

$$p_1=0.5 + x; q_1=0.5 - x$$
$$p_2=0.5 - x; q_2=0.5 + x$$

Think of x as the distance from p=q=0.5 i.e the distance from maximum heterozygosity

So
$$\overline{p} = \frac{0.5 + x + 0.5 -x}{2}=\frac{1}{2}$$

$$\overline{q} = \frac{0.5 + x + 0.5 -x}{2}=\frac{1}{2}$$

$$F_{ST} = \frac{\sigma^2}{\frac{1}{2}\frac{1}{2}} = 4\sigma^2$$

We also know that $$d^2 = 1- \sqrt{p_1p_2} - \sqrt{q_1q_2}$$

$$= 1- \sqrt{(0.5+x)(0.5-x)} - \sqrt{(0.5-x)(0.5+x)}$$

$$= 1 - \sqrt{0.25 +0.5x -0.5x -x^2} - \sqrt{0.25 -0.5x +0.5x- x^2 }$$

$$= 1 - \sqrt{0.25-x^2} - \sqrt{0.25 -x^2 }$$

Substituting with the approximation $$\sqrt{0.25-x^2}=0.5-x^2$$

we get $$d^2 = 1- (0.5-x^2) - (0.5-x^2) = 2x^2$$

Multiply both sides by 2 to get $$2d^2=4x^2$$

$$2d^2 = 4x^2$$

x <- c( 0.1, 0.2, 0.3, 0.35, 0.4,  0.49)
d.approx <- 4*x^2
d.approx
## [1] 0.0400 0.1600 0.3600 0.4900 0.6400 0.9604

p.1 <- 0.5+x
q.1 <- 0.5-x
p.2 <- 0.5-x
q.2 <- 0.5+x

d.square <- 1- sqrt(p.1*p.2) - sqrt(q.1*q.2) 
d.exact <- 2*d.square
d.exact
## [1] 0.04040821 0.16696972 0.40000000 0.57171431 0.80000000 1.60200503

plot(x, d.exact, col="red", ylab="Genetic Distance")
points(x, d.approx, col="black")
text(0.15, 1.5, "exact (2d^2)", col="red")
text(0.15, 1.4, "approximate (4x^2)", col="black")

With small differences (x) in allele frequency, the approximate method is accurate.

For the trigonometry proof:

Realizing the sine is $$\frac{opposite}{hypotnuse}$$ and cosine is $$\frac{adjacent}{hypotnuse}$$

$$sin(\theta_1)=\frac{\sqrt{p_1}}{1}$$ and likewise $$sin(\theta_2) = \frac{\sqrt{p_2}}{1}$$

$$cos(\theta_1)=\frac{\sqrt{q_1}}{1}$$ and likewise $$cos(\theta_2) =\frac{ \sqrt{p_2}}{1}$$

$$cos(\theta_1 - \theta_2) = cos(\theta) = cos(\theta_1)cos(\theta_2) + sin(\theta_1)sin(\theta_2)$$

Substituting we get:

$$cos(\theta) = \sqrt{q_1q_2} + \sqrt{p_1p_2}$$

By definition:

$$d^2 = 1 - \sqrt{p_1p_2} - \sqrt{q_1q_2} = 1 - cos(\theta)$$

$$ChordLength = 2 \sqrt{\frac{1-cos(\theta)}{2}} = 2\sqrt{\frac{2}{4}}\sqrt{1-cos(\theta)} = \sqrt{2}\sqrt{1-cos(\theta)} = \sqrt{2}\sqrt{d^2} = d\sqrt{2}$$

Problem b

The R function dist() will calculate distance matrices using a variety of methods, none of which are the genetic distance we are interested in, exemplified by the equation:

$$d^2 = 1 - \sqrt{p_1p_2} - \sqrt{q_1q_2}$$

Therefore we need to define our own function “GeneticDistance”. I also represent the alleles as p.1 and p.2 only, to simplify the function.

p.1 <- c(0.15, 0.25, 0.3, 0.4)
p.2 <- c(0.15, 0.25, 0.3, 0.4)

#Calculate q from p, don't use:
#q <- c(0.85, 0.75, 0.7, 0.6)

GeneticDistance <- function( p.1, p.2 ){
		sqrt( 1- sqrt(p.1*p.2) - sqrt((1-p.1)*(1-p.2)))
		}	



d.matrix <-outer( p.1, p.2, FUN="GeneticDistance")
d.matrix
##            [,1]       [,2]       [,3]       [,4]
## [1,] 0.00000000 0.08896551 0.12847387 0.20225771
## [2,] 0.08896551 0.00000000 0.03962176 0.11380615
## [3,] 0.12847387 0.03962176 0.00000000 0.07426822
## [4,] 0.20225771 0.11380615 0.07426822 0.00000000

#Use the dist() function to  convert the matrix to Euclidean distances, 
#suitable for clustering

d.eucl <- dist(d.matrix)
d.eucl
##            1          2          3
## 2 0.17761784                      
## 3 0.22765585 0.07914497           
## 4 0.29218432 0.19984790 0.14825288

d.cluster <- hclust(d.eucl, method="ward.D")
d.cluster

## 
## Call:
## hclust(d = d.eucl, method = "ward.D")
## 
## Cluster method   : ward.D 
## Distance         : euclidean 
## Number of objects: 4

#The cluster object can be passed directly to the plot() function
#to obtain a dendogram

plot(d.cluster)

## Box D ##

Problem a

$$\hat{F}=[ \frac{1}{2N}+(1-\frac{1}{2N})\hat{F} ](1-u)^2$$ $$\hat{F}=[\frac{1}{2N}+(\hat{F} -\frac{\hat{F}}{2N})](1-u)^2$$ $$\hat{F}=\frac{(1-u)^2}{2N}+ \hat{F}(1-u)^2 - \frac{\hat{F}}{2N}(1-u)^2$$ $$\hat{F} - \hat{F}(1-u)^2 + \frac{\hat{F}}{2N}(1-u)^2 =\frac{(1-u)^2}{2N}$$ $$\hat{F}[1-(1-u)^2 + \frac{(1-u)^2}{2N}]= \frac{(1-u)^2}{2N}$$ $$\hat{F} = \frac{(1-u)^2}{2N[1-(1-u)^2 + \frac{(1-u)^2}{2N}]}=\frac{(1-u)^2 }{2N-2N(1-u)^2+(1-u)^2}$$ $$\hat{F}=\frac{(1-u)^2}{2N} - \frac{(1-u)^2}{2N(1-u)^2} + \frac{(1-u)^2}{(1-u)^2}$$ $$\hat{F}=\frac{(1-u)^2}{2N} -\frac{1}{2N} + 1$$ $$\hat{F}=\frac{1}{2N}[(1-u)^2 - 1] + 1$$ $$\hat{F}=\frac{1}{2N[\frac{1}{(1-u)^2} - 1] + 1}$$ For the second part: $$\frac{1}{(1-u)^2} \approx 1+2u$$ Substitute into $$\hat{F}=\frac{1}{2N[\frac{1}{(1-u)^2} - 1] + 1}$$ to get $$\frac{1}{2N[1+2u-1]+1} = \frac{1}{2N[2u]+1} = \frac{1}{4Nu+1}$$

Problem b

As in a only substitute m for u

Problem c

$$\hat{F}=[\frac{1}{2N}+(1-\frac{1}{2N})\hat{F}][1-(m+u)]^2$$

Substitute $$x = \frac{1}{2N}$$ and $$y=[1-(m+u)]^2$$ to get

$$\hat{F}=[x+(1-x)\hat{F}]y = [xy +(1-x)\hat{F}y]$$

$$\hat{F}=xy + \hat{F}y - \hat{F}yx = y(x + \hat{F} - \hat{F}x)$$

Rearrange to get $$\hat{F} - \hat{F}y + \hat{F}xy = xy$$

$$\displaystyle\hat{F}(1-y+xy) = xy$$

$$\displaystyle\hat{F} = \frac{xy}{1-y+xy} = \frac{1}{(\frac{1}{xy} - \frac{1}{x} +1)} = \frac{1}{\frac{1}{x}(\frac{1}{y}-1)+1}$$

Substitute in $$x = \frac{1}{2N}$$ and $$y=[1-(m+u)]^2$$ to get

$$\displaystyle\hat{F} = \frac{1}{2N[\frac{1}{1-(m+u)^2}-1]+1}$$

Using $$\displaystyle\frac{1}{1-(m+u)^2}\approx 1+2(m+u)$$ assuming m is not too large gives:

$$\displaystyle\hat{F} = \frac{1}{2N[1+2(m+u)-1]+1} = \frac{1}{2N[2(m+u)]+1} = \frac{1}{4N(m+u)+1}$$

Box E

Problem a

For a selectively neutral allele its fixation probability is $$\frac{1}{2N_a}$$ and its loss probability is $$1-\frac{1}{2N_a}$$

Fixation requires $4N_e$ generations, while loss requires $2(\frac{N_e}{N_a}ln(2N_A))$

For $$N_a=5000$$ and $$N_e$$ = 5000:

N.a <- 5000
N.e <- 4000

#probability of fixation

p.fix <- 1/(2*N.a)
p.fix
## [1] 1e-04

#probability of loss

p.loss <- 1 - p.fix
p.loss
## [1] 0.9999

#Generations required to fix

gen <- 4*N.e
gen  #generations
## [1] 16000

#Generations required to lose


gen.l <- 2*(N.e/N.a)*log(2*N.a)
gen.l  #generations 
## [1] 14.73654

Problem b

For a selection coefficient s of 0.01:

1
2
3

s <- 0.01
2*s*(N.e/N.a)
## [1] 0.016

Problem c

The formula for the persistence of a harmful allele in a population is:

$$\displaystyle2\frac{N_e}{N_a}ln(\frac{2N_a}{2N_es})+1-\gamma$$

Where $$\gamma$$ is Euler’s constant. To obtain Euler’s constant we will use the negative derivative of the gamma function evaluated at 1: -digamma(1), which equals ~0.5772157

1
2
3

s <- 0.01
2*(N.e/N.a)*log(2*N.a/(2*N.e*s))+1-(-digamma(1))
## [1] 8.148086

i.e 8.148 generations.

Box F

Problem a

For the stepping stone model use $$\displaystyle\hat{F} \approx \frac{1}{4N\sqrt{2m\mu}+1}$$

For the island model use $$\displaystyle\hat{F} \approx \frac{1}{4N(m+\mu)+1}$$

u <- 1e-6
m <- 0.01
N <- 50

Fhat.stepping.stone <- 1/(4*N*sqrt(2*m*u) +1)
Fhat.stepping.stone
## [1] 0.9724937

Fhat.island <- 1/(4*N*(m+u)+1)
Fhat.island
## [1] 0.3333111

#Restricting migration to adjacent populations changes the fixation index by what percentage?


(Fhat.stepping.stone / Fhat.island)*100
## [1] 291.7676

Problem b

Using $$\displaystyle\hat{F} \approx \frac{1}{1 + 4\delta\sigma\sqrt{2\mu}}$$

and $$\delta = N$$ and $$\sigma=\sqrt{\mu}$$

u <- 1e-6
m <- 0.01
N <- 50
delta <- N
sigma <- sqrt(m)

Fhat <- 1/(1+4*sigma*delta*sqrt(2*u))
Fhat
## [1] 0.9724937

Problem c

Using $$\displaystyle\hat{F} \approx \frac{1}{1 - 8\pi\delta\sigma^2/\ln(2\mu)}$$

u <- 1e-6
delta <- 325
sigma <- 0.07

Fhat <- 1/(1-(8*pi*delta*sigma^2)/(log(2*u)))
Fhat
## [1] 0.2469104

Problem d

Comparing neighborhood sizes in b (uniform population distribution in one dimension) and c (uniform population distribution in two dimensions) above:

u <- 1e-6
m <- 0.01
N <- 50
delta <- N
sigma <- sqrt(m)

N.b <- 2*sqrt(pi)*delta*sigma
N.b
## [1] 17.72454

delta <- 325
sigma <- 0.07

N.c <- 4*pi*sigma^2*delta
N.c
## [1] 20.01195
#So the neighborhood size is effective the same for N= 50 (one dimension) and N=325 (two dimensions)

Box G

Problem a

$$\displaystyle p' = \frac{p(pw_{11} + qw_{12})}{\bar{w}}$$ Subtract p from both sides: $$\displaystyle p'-p = \frac{p(pw_{11} + qw_{12})}{\bar{w}}-p$$ Multiply p by$$\frac{\bar{w}}{\bar{w}}$$ and rearrange to get: $$\displaystyle \Delta p = \frac{1}{\bar{w}}[p(pw_{11} + qw_{12}) - p\bar{w} ]$$ Given that: $$\bar{w} = p^2w_{11} + 2pqw_{12} + q^2w_{22}$$ $$\displaystyle \Delta p = \frac{1}{\bar{w}}[p(pw_{11} + qw_{12}) - p(p^2w_{11} + 2pqw_{12} + q^2w_{22}) ]$$ $$\displaystyle \Delta p = \frac{1}{\bar{w}}[p^2w_{11} + pqw_{12} - pp^2w_{11} - 2p^2qw_{12} - pq^2w_{22} ]$$ $$\displaystyle \Delta p = \frac{1}{\bar{w}}[p^2w_{11} - pp^2w_{11} + pqw_{12} - 2ppqw_{12} - pq^2w_{22} ]$$ $$\displaystyle \Delta p = \frac{1}{\bar{w}}[w_{11}(p^2 - pp^2) + w_{12}(pq -2ppq) - pq^2w_{22} ]$$ $$\displaystyle \Delta p = \frac{1}{\bar{w}}[w_{11}p^2(1 - p) + w_{12}pq(1 -2p) - pq^2w_{22} ]$$ Given that $$1-2p=(1-p)-p=q-p$$: $$\displaystyle \Delta p = \frac{1}{\bar{w}}[w_{11}p^2q + w_{12}pq(q -p) - pq^2w_{22} ]$$ then factor out pq to get: $$\displaystyle \Delta p = \frac{pq}{\bar{w}}[w_{11}p + w_{12}(q -p) - qw_{22} ]$$ $$\displaystyle \Delta p = \frac{pq}{\bar{w}}[w_{11}p + w_{12}q -w_{12}p - qw_{22} ]$$ $$\displaystyle \Delta p = \frac{pq}{\bar{w}}[ p(w_{11} - w_{12}) +q(w_{12} - w_{22}) ]$$

Problem b

Part 1

$$\displaystyle w_{11}=w_{12}=1 \;\;\; w_{22}=1-s$$ $$\displaystyle\frac{dp}{dt} = pq^2s$$ or rearrange to $$\displaystyle\frac{dp}{pq^2}=s\; dt$$ So from $$t_0$$ to $$t_1$$: $$\displaystyle\int_{p_0}^{p_t} \frac{1}{pq^2}dp = \int_{t_0}^{t_1} s\; dt$$ First the left hand integral. Given that: $$\displaystyle\int \frac{1}{x(1-x)^2}\;dx = \frac{1}{1-x}-ln(\frac{1-x}{x})$$ Then $$\displaystyle\int_{p_0}^{p_t} \frac{1}{pq^2}dp =[\frac{1}{1-p_t}-ln(\frac{1-p_t}{p_t})]-[\frac{1}{1-p_0}-ln(\frac{1-p_0}{p_0})]$$ $$\displaystyle =\frac{1}{1-p_t} - ln(\frac{1-p_t}{p_t})- \frac{1}{1-p_0} + ln(\frac{1-p_0}{p_0}) = \frac{1}{q_t} - ln(\frac{q_t}{p_t}) - \frac{1}{q_0} + ln(\frac{q_0}{p_0})$$ Because $$ln(\frac{a}{b}) = ln(a) - ln (b)$$ and $$ln(ab) = ln(a)+ln(b)$$ simplify to: $$\displaystyle = ln(\frac{p_tq_0}{p_0q_t}) + \frac{1}{q_t} - \frac{1}{q_0}$$ For the right hand integral: $$\displaystyle \int_{t_0}^{t_1} s\; dt = s(t - 0) = st$$ Setting the two solutions equal gives: $$\displaystyle st = ln(\frac{p_tq_0}{p_0q_t}) + \frac{1}{q_t} - \frac{1}{q_0}$$

Part 2

$$\displaystyle \frac{dp}{dt} = \frac{pqs}{2}$$ rearrange to get:

$$\displaystyle 2\frac{1}{pq};dp = s;dt$$

Right hand side as above. For the left side:

$$\displaystyle 2\int^{p_t}_{p_0}\frac{1}{p(1-p)} dp$$ Given that $$\displaystyle \int \frac{1}{x(1-x)}\;dx = -ln\frac{(1-x)}{x}$$ $$\displaystyle 2\int^{p_t}_{p_0}\frac{1}{p(1-p)}\;dp = 2[[-ln\frac{1-p_t}{p_t}]-[-ln\frac{1-p_0}{p_0}]]$$ $$\displaystyle = 2[ ln\frac{p_t}{1-p_t} + ln\frac{1-p_0}{p_0}]$$ $$\displaystyle = 2[ ln\frac{p_t}{q_t} + ln\frac{q_0}{p_0}]$$ $$\displaystyle = 2 ln\frac{p_tq_0}{q_tp_0}$$ Part 3 $$\displaystyle \frac{dp}{dt} = p^2qs$$ rearrange to get: $$\displaystyle \frac{1}{p^2q}\;dp = s\;dt$$ Right hand side as above. We are also given that $$\displaystyle \int \frac{1}{x^2(1-x)}\;dx=-\frac{1}{x}-ln\frac{1-x}{x}$$ so: $$\displaystyle \int^{p_t}_{p_0}\frac{1}{p^2(1-p)}\;dp = -\frac{1}{p_t} - ln\frac{1-p_t}{p_t}-[-\frac{1}{p_0} - ln\frac{1-p_0}{p_0}]$$ $$\displaystyle = -\frac{1}{p_t} - ln\frac{q_t}{p_t} + \frac{1}{p_0} + ln\frac{q_0}{p_0}$$ $$\displaystyle = -\frac{1}{p_t} + \frac{1}{p_0} + ln\frac{p_t}{q_t} + ln\frac{q_0}{p_0}$$ $$\displaystyle = -\frac{1}{p_t} + \frac{1}{p_0} + ln\frac{p_tq_0}{q_tp_0}$$

Part 4

Problem c

#Part 1

s <- 0.01
p.0 <- 0.01
q.0 <- 1-p.0
p.t <- 0.99
q.t <- 1-p.t

t <- (log( (p.t*q.0)/(p.0*q.t)) + 1/q.t - 1/q.0)/s
t
## [1] 10818.01

#Part 2 

t <- (2*log( (p.t*q.0)/(p.0*q.t)))/s
t
## [1] 1838.048

#Part 3

t <- (log( (p.t*q.0)/(p.0*q.t)) - 1/p.t + 1/p.0)/s
t
## [1] 10818.01

#Part 4 

t <- (log( (p.t*q.0)/(p.0*q.t)))/s
t
## [1] 919.024

Box H

Problem a

Given: $$P_t= P_{t-1}(1-\mu)+(1-P_{t-1})\nu$$
We want the expression for: $P_t - \frac{\nu}{\mu+\nu}$ so subtract $\frac{\nu}{\mu+\nu}$ from both sides of the given equation.
$$p_t - \frac{\nu}{\mu+\nu} = p_{t-1}(1-\mu)+(1-p_{t-1})\nu - \frac{\nu}{\mu+\nu}$$ $$p_t - \frac{\nu}{\mu+\nu} = p_{t-1}- \mu p_{t-1} + \nu - \nu p_{t-1} - \frac{\nu}{\mu + \nu}$$ $$p_t - \frac{\nu}{\mu+\nu} = p_{t-1}(1 - \mu - \nu ) + \nu - \frac{\nu}{\mu + \nu}$$ $$p_t - \frac{\nu}{\mu+\nu} = p_{t-1}(1 - \mu - \nu ) + \frac{\nu(\mu + \nu)}{\mu + \nu} - \frac{\nu}{\mu + \nu}$$ $$p_t - \frac{\nu}{\mu+\nu} = p_{t-1}(1 - \mu - \nu ) + \frac{\nu\mu + \nu^2 - \nu}{\mu + \nu}$$ $$p_t - \frac{\nu}{\mu+\nu} = p_{t-1}(1 - \mu - \nu ) - \frac{\nu(1 - \mu - \nu)}{\mu + \nu}$$ $$p_t - \frac{\nu}{\mu+\nu} = [p_{t-1} - \frac{\nu}{\mu + \nu}](1 - \mu - \nu)$$ So $a=\frac{\nu}{\nu+\mu}$ and $b=(1-\mu-\nu)$ Given: $p_t - a = (p_0-a)b^t$ substitute to get: $$p_t - \frac{\nu}{\mu+\nu} = [p_0 - \frac{\nu}{\mu + \nu}](1 - \mu - \nu)^t$$ ### Problem b ### $$p_t = p_{t-1}(1-m)+\bar{p}m$$ We are interested in $p_t - \bar{p}$ so subtract $\bar{p}$ from both sides. $$p_t - \bar{p} = p_{t-1}(1-m)+\bar{p}m - \bar{p}$$ $$p_t - \bar{p} = p_{t-1}(1-m)+\bar{p}(m - 1)$$ $$p_t - \bar{p} = p_{t-1}(1-m)-\bar{p}(1 - m)$$ $$p_t - \bar{p} = (p_{t-1} - \bar{p})(1-m)$$ So $a=\bar{p}$ and $b=(1-m)$ $$p_t - \bar{p} = (p_{0} - \bar{p})(1-m)^t$$ Divide both sides by $\bar{p}$: $$\frac{p_t}{\bar{p}} - 1 = (\frac{p_0}{\bar{p}} -1)(1-m)^t$$ $$\frac{p_t}{\bar{p}} = (\frac{p_0}{\bar{p}} -1)(1-m)^t + 1$$ If $\frac{p_0}{\bar{p}}=2$, m=0.01, and t=69: $$\frac{p_t}{\bar{p}} = (2 -1)(1-0.01)^{69} + 1= 1.499$$

Problem c

Box I

A fitness is $w_{11}$
a fitness is $w_{22}$
$$w=\frac{w_{11}}{w_{22}}$$ Given: $p_t = \frac{p_{t-1}w_{11}}{\bar{w}}$ and $q_t = \frac{q_{t-1}w_{22}}{\bar{w}}$ $$\frac{p_t}{q_t} = \frac{\frac{p_{t-1}w_{11}}{\bar{w}}}{\frac{q_{t-1}w_{22}}{\bar{w}}} = \frac{p_{t-1}w_{11}}{\bar{w}}\frac{\bar{w}}{q_{t-1}w_{22}} = \frac{p_{t-1}}{q_{t-1}}\frac{w_{11}}{w_{22}} = \frac{p_{t-1}}{q_{t-1}}w$$ Given: $\displaystyle \frac{p_t}{q_t}=\frac{p_0}{q_0}w^t$ take the natural ln of bothh sides:
Given: $\displaystyle ln(\frac{p_t}{q_t})=ln(\frac{p_0}{q_0}) + t ln(w)$
For a line y=mx + b where m=slope and b = y intercept
For the plot of $\displaystyle ln(\frac{p_t}{q_t})$ vs t:
Slope = m = $\displaystyle \frac{y_1-y_0}{x_1-x_0} = ln(w)$
Y intercept = b = $\displaystyle y_0 -mx_0= ln(\frac{p_0}{q_0})$
Two timepoints are given, call them $t_1$ and $t_2$
Use them to calculate the slope and y intercept. Then use the above equations to calculate $p_0$

t.1 <- 5
p.1 <- 0.561

#y.1 <- ln(p.1/q.1) = ln(p.1/(1-p.1))

y.1 <- log(p.1/(1-p.1))
y.1
## [1] 0.2452215

t.2 <- 24
p.2 <- 0.9467

y.2 <- log(p.2/(1-p.2))
y.2
## [1] 2.877046

#slope calculation using t (time) as x
m <- (y.2-y.1)/(t.2-t.1)
m
## [1] 0.1385171

# m = ln(w)  so w=e^m
w <- exp(m)
w
## [1] 1.148569

Use $$\displaystyle \frac{p_t}{q_t}=\frac{p_0}{q_0}w^t$$ to calculate $$p_0$$

#at t=5
#p.1/(1-p.1) = (p.0/q.0)*w^t.1  
# set pq <- p.0/q.0 so 
pq <- p.1/((1-p.1)*w^t.1)
pq
## [1] 0.6393112

#so pq or p.0/q.0 or p.0/(1-p.0) = 0.6393 solve for p.0
p.0 <- 0.6393/1.6393
p.0
## [1] 0.3899835

q.0 <- 1-p.0
q.0
## [1] 0.6100165

#calculate y intercept

b <- log(p.0/q.0)
b
## [1] -0.4473815

#Since the A and a strains were originally equal in fitness, i.e. w=1
#now that w=1.149 that is an increase of 0.149 per 120 generations
#per generation:
(w-1)/120  #fitness units per generation
## [1] 0.001238077

(w-1)/120*100  #percent per generation
## [1] 0.1238077

Box J

For L(mutation):

L(mutation) = $$\displaystyle \mu$$ = 2e-6

$$\displaystyle \hat{q}=\sqrt{\frac{\mu}{s}}$$

u <- 2e-6
s <- 0.0052
q.hat <- sqrt(u/s)
q.hat
## [1] 0.01961161
L.mut <- u
L.mut
## [1] 2e-06

L(segregation):

L(segregation) = (st)/(s+t)

$\hat{q}$= s/(s+t)

s <- 1.04e-4
t <- 0.0052
q.hat <- s/(s+t)
q.hat
## [1] 0.01960784
L.seg <- (s*t)/(s+t)
L.seg
## [1] 0.0001019608
L.seg/L.mut
## [1] 50.98039

Box K

Problem a

At equilibrium $$w_1 = w_2$$

$$\theta -2p_1(1-p_1) = \theta -2p_2(1-p_2)$$

$$p_1(1-p_1) = \theta2p_2(1-p_2)$$

n <- c(3,30,300)

w.bar <- ((n-2)*(n-1))/n^2 
w.bar
## [1] 0.2222222 0.9022222 0.9900222

Problem b

If all frequencies are equal then $$p_i=p_j$$ and $$p=\frac{1}{n}$$

n <- c(3,30,300)

w.bar <- (3*n-2)/n^2 
w.bar
## [1] 0.777777778 0.097777778 0.009977778

Box L

Problem a

altitude <- c(259, 914,1402, 1890,2438, 2621, 3018)
freq <- c(0.25, 0.35, 0.37, 0.44, 0.45, 0.55, 0.50)
legit <- c(0.517, 0.294, 0.253, 0.115, 0.096, -0.096, 0)

model <-lm(legit ~ altitude )
summary(model)
## 
## Call:
## lm(formula = legit ~ altitude)
## 
## Residuals:
##         1         2         3         4         5         6         7 
##  0.047945 -0.046583  0.008133 -0.034151  0.054334 -0.101773  0.072095 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.199e-01  5.771e-02   9.008 0.000281 ***
## altitude    -1.961e-04  2.867e-05  -6.841 0.001019 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06958 on 5 degrees of freedom
## Multiple R-squared:  0.9035,	Adjusted R-squared:  0.8842 
## F-statistic:  46.8 on 1 and 5 DF,  p-value: 0.001019

plot(altitude, legit)
abline( model )

#using y=mx + b where m:slope, b:y-intercept
b <- model[[1]][1]
b
## (Intercept) 
##   0.5198551

m <- model[[1]][2]
m
##      altitude 
## -0.0001961398

a <- 1/abs(m) #the distance along a cline corresponding to an allele frequency change of 1 legit, measured in meters.
a
## altitude 
## 5098.404

#use the reported variance of 101,500m^2
v <- 101500
g <- 4*v/a^3
g  # per meter

##     altitude 
## 3.063538e-06
#the altitude traversed in meters

alt.trav <- 3018-259
alt.trav
## [1] 2759
#percent change across the distance

alt.trav*g
##    altitude 
## 0.008452301

Problem b

legit <- c(0.542, 0.401)
distance <- c( 0, 2000)

model <-lm(legit ~ distance )
summary(model)
## 
## Call:
## lm(formula = legit ~ distance)
## 
## Residuals:
## ALL 2 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  5.42e-01         NA      NA       NA
## distance    -7.05e-05         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,	Adjusted R-squared:    NaN 
## F-statistic:   NaN on 1 and 0 DF,  p-value: NA

plot(distance, legit)
abline( model )

#using y=mx + b where m:slope, b:y-intercept
b <- model[[1]][1]
b[[1]]
## [1] 0.542

m <- model[[1]][2]
m[[1]]
## [1] -7.05e-05

a <- 1/abs(m) #the distance along a cline corresponding to an allele frequency change of 1 legit, measured in kilometers.
a[[1]]
## [1] 14184.4
#use the reported variance of 10 km^2
v <- 10
g <- 4*v/a^3
g[[1]]  # per kilometer
## [1] 1.40161e-11
#percent change across the distance
2000*g[[1]]
## [1] 2.803221e-08