Genetic interactions and fitness landscapes
Alex Gavryushkin

           Joint work with:
  • Kristina Crona, American U., Washington, DC, USA
  • Devin Greene, American U., Washington, DC, USA
  • Niko Beerenwinkel, ETH Zurich, Basel, Switzerland
           May 27, 2017
Throughout, we consider $n$ biallelic loci, for different $n$.

That is, the set of genotypes is $\mathcal G = \{0,1\}^{n}$.

A fitness landscape is a function $w:\mathcal G \to \mathbb R^+$.

For $g \in \mathcal G$,  $w(g)$ is called the fitness of genotype $g$ and denoted $w_g$.

Wrightian fitness

is defined as the average number of offspring in a population with fixed genotype.

That is:

If $N(t)$ is the population size of generation number $t$ and $g$ is the genotype of the population, then Wrightian fitness $w_g$ of the population is defined by $$N(t + 1) = w_gN(t)$$

Epistasis

is defined as the deviation from the additive expectation of allelic effects: $$u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$$

Understanding three-way interactions

Total three-way interaction?

$\small u_{111} = w_{000} + w_{011} + w_{101} + w_{110} - (w_{001} + w_{010} + w_{100} + w_{111})$

Marginal epistasis?

$\small u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$

Conditional epistasis?

$\small e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$

Total mess!

Algebraic Geometry sorts out the mess!

$e = \frac12(u_{011} + u_{111})$

In general, the four interaction coordinates $$u_{011}, u_{101}, u_{110}, u_{111}$$ allow to describe all possible kinds of interaction!

There are 20 types of three-way interaction and they are the circuits of the three-cube.

Yep, we've got the list!

$$ \scriptsize \begin{align*} a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\ b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\ c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\ d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\ e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\ f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\ g&=w_{000}-w_{011}-w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\ h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\ i&=w_{000}-w_{010}-w_{101}+w_{111}\\ j&=w_{001}-w_{011}-w_{100}+w_{110}\\ k&=w_{000}-w_{001}-w_{110}+w_{111}\\ l&=w_{010}-w_{011}-w_{100}+w_{101}\\ \end{align*} $$
$$ \scriptsize \begin{align*} a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\ b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\ c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\ d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\ e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\ f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\ \color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}-w_{011}-w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\ h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\ i&=w_{000}-w_{010}-w_{101}+w_{111}\\ j&=w_{001}-w_{011}-w_{100}+w_{110}\\ k&=w_{000}-w_{001}-w_{110}+w_{111}\\ l&=w_{010}-w_{011}-w_{100}+w_{101}\\ \end{align*} $$

This is known as Beerenwinkel-Pachter-Sturmfels approach,

which provides a complete picture of interactions!

BUT

the approach is

  • based on the availability of fitness measurements
  • computationally feasible for up to four (?) loci

Problem: What if no (credible) fitness measurements are available?

Like in this malaria drug resistance data set:
Ogbunugafor et al. Malar. J. 2016

Why is fitness hard to measure?

Wikipedia

Results at a glance

  • We provide a complete characterization of partial fitness orders that imply circuit interaction (think epistasis).
  • Partial fitness orders arise in competition-like experiments and include:
    • Rank orders
    • Mutation fitness graphs


Inferring Genetic Interactions From Comparative Fitness Data, bioRxiv, 2017

Rank orders. The simplest case.

$\small u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$

Characterization of epistatic rank orders

Theorem 1. Consider a biallelic $n$-locus system. The number of rank orders which imply $n$-way epistasis is: \[ \frac{(2^n)! \times 2}{2^{n-1}+1} \]

Corollary. The fraction of rank orders that imply $n$-way epistasis among all rank orders is: \[ \frac{2}{2^{n-1}+1} \]

Mutation graph

Connection between rank orders and mutation graphs

Theorem 2. Let $f$ be a linear form with integer coefficients. Assume that the sum of the coefficients of $f$ is zero. Then a rank order implies that $f$ is not zero if and only if the rank order is mapped to a Dyck word by $\varphi^f$.

Theorem 3. A partial order implies $f$-interaction exactly if all its total extensions do.

Call a map $g: O \to E$ monotonic with respect to a partial order $\prec$ if $x \prec g(x)$ for all $x$, where $O \sqcup E$ is a partition of $\prec$.

Conjecture 4. A partial order $\prec$ implies positive (negative) interaction iff there exists a monotonic wrt $\prec$ bijection from $O$ to $E$ (from $E$ to $O$).

Applications

  • HIV-1

  • Antibiotic resistance

  • Gut microbiome (with Will Ludington, UC Berkeley)

  • Synthetic lethality

  • Knockdown cell lines

Methodologically, this allows us to advise further measurements (experiments) for incomplete data sets, thus reducing the number of potential experiments significantly.

Example: antibiotic resistance

Mira et al. PLOS ONE, 2015

Example: antibiotic resistance

Example: antibiotic resistance

Example: antibiotic resistance

Results in more detail

Efficient methods for:
  • Circuit interaction inference (including epistasis and three-way interaction) for total orders
  • Complete analysis of partial orders (including mutation graphs) with "distance to interaction" inference
  • Suggestions for possible completions in case of missing data and/or high uncertainty


Software (pre-release stage):
https://github.com/gavruskin/fitlands

Want to learn more?

We've got you covered!

Funding

Thanks for your attention!


and stay tuned