Fitness landscapes and incomplete data
Alex Gavryushkin
12 February 2018
Throughout, we consider $n$ biallelic loci, for different $n$.
That is, the set of genotypes is $\mathcal G = \{0,1\}^{n}$.
A fitness landscape is a function $w:\mathcal G \to \mathbb R^+$.
For $g \in \mathcal G$, $w(g)$ is called the fitness of genotype $g$ and denoted $w_g$.
Epistasis
is defined as the deviation from the additive expectation of allelic effects:
$$u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$$
Problem: What if no (credible) fitness measurements are available?
Image: Wikipedia
Mutation fitness graph
Ogbunugafor et al. Malar. J. 2016
Rank orders. The simplest case.
$\small u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$
Characterization of epistatic rank orders
Theorem 1. Consider a biallelic $n$-locus system.
The number of rank orders which imply $n$-way epistasis is:
\[
\frac{(2^n)! \times 2}{2^{n-1}+1}
\]
Corollary. The fraction of rank orders that imply $n$-way epistasis
among all rank orders is:
\[
\frac{2}{2^{n-1}+1}
\]
Connection between rank orders and mutation graphs
Observation: A partial fitness order implies epistasis if and only if all its total extensions do.
Let $f$ be a linear form with integer coefficients.
Assume that the sum of the coefficients of $f$ is zero.
A partial fitness order of genotypes $\prec$ implies positive $f$-interaction if $f(w) > 0$ whenever $w$ satisfy $\prec$.
Theorem: A partial order $P = (G, \prec)$ implies positive $f$-interaction if and only if there exists a partition of the set of all genotypes $G$ into pairs $(p_i, n_j)$ such that $p_i \succ n_j$ for all $i, j$, where $p_i$ have positive coefficients in $f$ and $n_j$ negative.
Proof: Exercise.
Corollary: It is polynomial in the number of genotypes ($n^{5/2}$) to check whether a partial order implies $f$-interaction.
Applications
-
HIV-1
-
Antibiotic resistance
-
Gut microbiome
-
Synthetic lethality
-
Knockdown cell lines
Methodologically, this allows us to advise further measurements (experiments) for incomplete data sets, thus reducing the number of potential experiments significantly.
Example: antibiotic resistance
Mira et al. PLOS ONE, 2015
Example: antibiotic resistance
Example: antibiotic resistance
Example: antibiotic resistance
Mira et al. PLOS ONE, 2015
$u_{111} = w_{000} + w_{011} + w_{101} + w_{110} - (w_{001} + w_{010} + w_{100} + w_{111})$
Understanding three-way interactions
Total three-way interaction?
$\small u_{111} = w_{000} + w_{011} + w_{101} + w_{110} - (w_{001} + w_{010} + w_{100} + w_{111})$
Marginal epistasis?
$\small u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$
Conditional epistasis?
$\small e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$
Total mess!
Algebraic Geometry sorts out the mess!
$e = \frac12(u_{011} + u_{111})$
In general, the four interaction coordinates
$$u_{011}, u_{101}, u_{110}, u_{111}$$
allow to describe all possible kinds of interaction!
There are 20 types of three-way interaction and they are the circuits of the three-cube.
Yep, we've got the list!
$$
\scriptsize
\begin{align*}
a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\
b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\
c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\
d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\
e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\
f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\
g&=w_{000}-w_{011}-w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\
h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\
i&=w_{000}-w_{010}-w_{101}+w_{111}\\
j&=w_{001}-w_{011}-w_{100}+w_{110}\\
k&=w_{000}-w_{001}-w_{110}+w_{111}\\
l&=w_{010}-w_{011}-w_{100}+w_{101}\\
\end{align*}
$$
$$
\scriptsize
\begin{align*}
a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\
b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\
c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\
d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\
e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\
f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\
\color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}-w_{011}-w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\
h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\
i&=w_{000}-w_{010}-w_{101}+w_{111}\\
j&=w_{001}-w_{011}-w_{100}+w_{110}\\
k&=w_{000}-w_{001}-w_{110}+w_{111}\\
l&=w_{010}-w_{011}-w_{100}+w_{101}\\
\end{align*}
$$
Acknowledgements
- You
- Niko Beerenwinkel, ETH Zürich
- Bernd Sturmfels, Max Planck Institute Leipzig
- Kristina Crona, American University
- Devin Greene, American University
- Lisa Lamberti, ETH Zürich
- Caitlin Lienkaemper, Penn State
and stay tuned