Genetic interactions and fitness landscapes
Alex Gavryushkin
Joint work with:
 Kristina Crona, American U., Washington, DC, USA
 Devin Greene, American U., Washington, DC, USA
 Niko Beerenwinkel, ETH Zurich, Basel, Switzerland
May 27, 2017
Throughout, we consider $n$ biallelic loci, for different $n$.
That is, the set of genotypes is $\mathcal G = \{0,1\}^{n}$.
A fitness landscape is a function $w:\mathcal G \to \mathbb R^+$.
For $g \in \mathcal G$, $w(g)$ is called the fitness of genotype $g$ and denoted $w_g$.
Wrightian fitness
is defined as the average number of offspring in a population with fixed genotype.
That is:
If $N(t)$ is the population size of generation number $t$ and $g$ is the genotype of the population,
then Wrightian fitness $w_g$ of the population is defined by
$$N(t + 1) = w_gN(t)$$
Epistasis
is defined as the deviation from the additive expectation of allelic effects:
$$u_{11} = w_{00} + w_{11}  (w_{01} + w_{10})$$
Understanding threeway interactions
Total threeway interaction?
$\small u_{111} = w_{000} + w_{011} + w_{101} + w_{110}  (w_{001} + w_{010} + w_{100} + w_{111})$
Marginal epistasis?
$\small u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$
Conditional epistasis?
$\small e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$
Total mess!
Algebraic Geometry sorts out the mess!
$e = \frac12(u_{011} + u_{111})$
In general, the four interaction coordinates
$$u_{011}, u_{101}, u_{110}, u_{111}$$
allow to describe all possible kinds of interaction!
There are 20 types of threeway interaction and they are the circuits of the threecube.
Yep, we've got the list!
$$
\scriptsize
\begin{align*}
a&= w_{000}w_{010}w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}w_{111}2w_{000}\\
b&=w_{001}w_{011}w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}w_{000}2w_{111}\\
c&=w_{000}w_{001}w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}w_{001}2w_{110}\\
d&=w_{010}w_{011}w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}w_{110}2w_{001}\\
e&=w_{000}w_{001}w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}w_{010}2w_{101}\\
f&=w_{100}w_{101}w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}w_{101}2w_{010}\\
g&=w_{000}w_{011}w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}w_{011}2w_{100}\\
h&=w_{001}w_{010}w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}w_{100}2w_{011}\\
i&=w_{000}w_{010}w_{101}+w_{111}\\
j&=w_{001}w_{011}w_{100}+w_{110}\\
k&=w_{000}w_{001}w_{110}+w_{111}\\
l&=w_{010}w_{011}w_{100}+w_{101}\\
\end{align*}
$$
$$
\scriptsize
\begin{align*}
a&= w_{000}w_{010}w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}w_{111}2w_{000}\\
b&=w_{001}w_{011}w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}w_{000}2w_{111}\\
c&=w_{000}w_{001}w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}w_{001}2w_{110}\\
d&=w_{010}w_{011}w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}w_{110}2w_{001}\\
e&=w_{000}w_{001}w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}w_{010}2w_{101}\\
f&=w_{100}w_{101}w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}w_{101}2w_{010}\\
\color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}w_{011}w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}w_{011}2w_{100}\\
h&=w_{001}w_{010}w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}w_{100}2w_{011}\\
i&=w_{000}w_{010}w_{101}+w_{111}\\
j&=w_{001}w_{011}w_{100}+w_{110}\\
k&=w_{000}w_{001}w_{110}+w_{111}\\
l&=w_{010}w_{011}w_{100}+w_{101}\\
\end{align*}
$$
This is known as BeerenwinkelPachterSturmfels approach,
which provides a complete picture of interactions!
BUT
the approach is

based on the availability of fitness measurements

computationally feasible for up to four (?) loci
Problem: What if no (credible) fitness measurements are available?
Like in this malaria drug resistance data set:
Ogbunugafor et al. Malar. J. 2016
Why is fitness hard to measure?
Results at a glance
 We provide a complete characterization of partial fitness orders that imply circuit interaction (think epistasis).
 Partial fitness orders arise in competitionlike experiments and include:
 Rank orders
 Mutation fitness graphs
Inferring Genetic Interactions From Comparative Fitness Data, bioRxiv, 2017
Rank orders. The simplest case.
$\small u_{11} = w_{00} + w_{11}  (w_{01} + w_{10})$
Characterization of epistatic rank orders
Theorem 1. Consider a biallelic $n$locus system.
The number of rank orders which imply $n$way epistasis is:
\[
\frac{(2^n)! \times 2}{2^{n1}+1}
\]
Corollary. The fraction of rank orders that imply $n$way epistasis
among all rank orders is:
\[
\frac{2}{2^{n1}+1}
\]
Mutation graph
Connection between rank orders and mutation graphs
Theorem 2. Let $f$ be a linear form with integer coefficients.
Assume that the sum of the coefficients of $f$ is zero.
Then a rank order implies that $f$ is not zero if and only if the rank order is mapped to a Dyck word by $\varphi^f$.
Theorem 3. A partial order implies $f$interaction exactly if all its total extensions do.
Call a map $g: O \to E$ monotonic with respect to a partial order $\prec$ if $x \prec g(x)$ for all $x$, where $O \sqcup E$ is a partition of $\prec$.
Conjecture 4. A partial order $\prec$ implies positive (negative) interaction iff there exists a monotonic wrt $\prec$ bijection from $O$ to $E$ (from $E$ to $O$).
Results in more detail
Efficient methods for:
 Circuit interaction inference (including epistasis and threeway interaction) for total orders
 Complete analysis of partial orders (including mutation graphs) with "distance to interaction" inference
 Suggestions for possible completions in case of missing data and/or high uncertainty
Software (prerelease stage):
https://github.com/gavruskin/fitlands
Want to learn more?
We've got you covered!
Funding
Thanks for your attention!
and stay tuned