Genetic interactions and fitness landscapes
Alex Gavryushkin
Joint work with:
- Kristina Crona, American U., Washington, DC, USA
- Devin Greene, American U., Washington, DC, USA
- Niko Beerenwinkel, ETH Zurich, Basel, Switzerland
May 27, 2017
Throughout, we consider $n$ biallelic loci, for different $n$.
That is, the set of genotypes is $\mathcal G = \{0,1\}^{n}$.
A fitness landscape is a function $w:\mathcal G \to \mathbb R^+$.
For $g \in \mathcal G$, $w(g)$ is called the fitness of genotype $g$ and denoted $w_g$.
Wrightian fitness
is defined as the average number of offspring in a population with fixed genotype.
That is:
If $N(t)$ is the population size of generation number $t$ and $g$ is the genotype of the population,
then Wrightian fitness $w_g$ of the population is defined by
$$N(t + 1) = w_gN(t)$$
Epistasis
is defined as the deviation from the additive expectation of allelic effects:
$$u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$$
Understanding three-way interactions
Total three-way interaction?
$\small u_{111} = w_{000} + w_{011} + w_{101} + w_{110} - (w_{001} + w_{010} + w_{100} + w_{111})$
Marginal epistasis?
$\small u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$
Conditional epistasis?
$\small e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$
Total mess!
Algebraic Geometry sorts out the mess!
$e = \frac12(u_{011} + u_{111})$
In general, the four interaction coordinates
$$u_{011}, u_{101}, u_{110}, u_{111}$$
allow to describe all possible kinds of interaction!
There are 20 types of three-way interaction and they are the circuits of the three-cube.
Yep, we've got the list!
$$
\scriptsize
\begin{align*}
a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\
b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\
c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\
d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\
e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\
f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\
g&=w_{000}-w_{011}-w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\
h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\
i&=w_{000}-w_{010}-w_{101}+w_{111}\\
j&=w_{001}-w_{011}-w_{100}+w_{110}\\
k&=w_{000}-w_{001}-w_{110}+w_{111}\\
l&=w_{010}-w_{011}-w_{100}+w_{101}\\
\end{align*}
$$
$$
\scriptsize
\begin{align*}
a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\
b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\
c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\
d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\
e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\
f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\
\color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}-w_{011}-w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\
h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\
i&=w_{000}-w_{010}-w_{101}+w_{111}\\
j&=w_{001}-w_{011}-w_{100}+w_{110}\\
k&=w_{000}-w_{001}-w_{110}+w_{111}\\
l&=w_{010}-w_{011}-w_{100}+w_{101}\\
\end{align*}
$$
This is known as Beerenwinkel-Pachter-Sturmfels approach,
which provides a complete picture of interactions!
BUT
the approach is
-
based on the availability of fitness measurements
-
computationally feasible for up to four (?) loci
Problem: What if no (credible) fitness measurements are available?
Like in this malaria drug resistance data set:
Ogbunugafor et al. Malar. J. 2016
Why is fitness hard to measure?
Results at a glance
- We provide a complete characterization of partial fitness orders that imply circuit interaction (think epistasis).
- Partial fitness orders arise in competition-like experiments and include:
- Rank orders
- Mutation fitness graphs
Inferring Genetic Interactions From Comparative Fitness Data, bioRxiv, 2017
Rank orders. The simplest case.
$\small u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$
Characterization of epistatic rank orders
Theorem 1. Consider a biallelic $n$-locus system.
The number of rank orders which imply $n$-way epistasis is:
\[
\frac{(2^n)! \times 2}{2^{n-1}+1}
\]
Corollary. The fraction of rank orders that imply $n$-way epistasis
among all rank orders is:
\[
\frac{2}{2^{n-1}+1}
\]
Mutation graph
Connection between rank orders and mutation graphs
Theorem 2. Let $f$ be a linear form with integer coefficients.
Assume that the sum of the coefficients of $f$ is zero.
Then a rank order implies that $f$ is not zero if and only if the rank order is mapped to a Dyck word by $\varphi^f$.
Theorem 3. A partial order implies $f$-interaction exactly if all its total extensions do.
Call a map $g: O \to E$ monotonic with respect to a partial order $\prec$ if $x \prec g(x)$ for all $x$, where $O \sqcup E$ is a partition of $\prec$.
Conjecture 4. A partial order $\prec$ implies positive (negative) interaction iff there exists a monotonic wrt $\prec$ bijection from $O$ to $E$ (from $E$ to $O$).
Results in more detail
Efficient methods for:
- Circuit interaction inference (including epistasis and three-way interaction) for total orders
- Complete analysis of partial orders (including mutation graphs) with "distance to interaction" inference
- Suggestions for possible completions in case of missing data and/or high uncertainty
Software (pre-release stage):
https://github.com/gavruskin/fitlands
Want to learn more?
We've got you covered!
Funding
Thanks for your attention!
and stay tuned