Genetic interactions and fitness landscapes

Genetic interactions and fitness landscapes
Alex Gavryushkin

Kristina Crona, American U., Washington, DC, USA
Devin Greene, American U., Washington, DC, USA
Niko Beerenwinkel, ETH Zurich, Basel, Switzerland
May 27, 2017

Throughout, we consider $n$ biallelic loci, for different $n$.

That is, the set of genotypes is $\mathcal G = \{0,1\}^{n}$.

A fitness landscape is a function $w:\mathcal G \to \mathbb R^+$.

For $g \in \mathcal G$, $w(g)$ is called the fitness of genotype $g$ and denoted $w_g$.

Wrightian fitness

is defined as the average number of offspring in a population with fixed genotype.

That is:

If $N(t)$ is the population size of generation number $t$ and $g$ is the genotype of the population, then Wrightian fitness $w_g$ of the population is defined by $$N(t + 1) = w_gN(t)$$

Epistasis

is defined as the deviation from the additive expectation of allelic effects: $$u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$$

Understanding three-way interactions

Total three-way interaction?

$\small u_{111} = w_{000} + w_{011} + w_{101} + w_{110} - (w_{001} + w_{010} + w_{100} + w_{111})$

Marginal epistasis?

$\small u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$

Conditional epistasis?

$\small e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$

Total mess!

Algebraic Geometry sorts out the mess!

$e = \frac12(u_{011} + u_{111})$

In general, the four interaction coordinates $$u_{011}, u_{101}, u_{110}, u_{111}$$ allow to describe all possible kinds of interaction!

There are 20 types of three-way interaction and they are the circuits of the three-cube.

Yep, we've got the list!

$$ \scriptsize \begin{align*} a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\ b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\ c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\ d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\ e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\ f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\ g&=w_{000}-w_{011}-w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\ h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\ i&=w_{000}-w_{010}-w_{101}+w_{111}\\ j&=w_{001}-w_{011}-w_{100}+w_{110}\\ k&=w_{000}-w_{001}-w_{110}+w_{111}\\ l&=w_{010}-w_{011}-w_{100}+w_{101}\\ \end{align*} $$

$$ \scriptsize \begin{align*} a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\ b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\ c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\ d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\ e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\ f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\ \color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}-w_{011}-w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\ h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\ i&=w_{000}-w_{010}-w_{101}+w_{111}\\ j&=w_{001}-w_{011}-w_{100}+w_{110}\\ k&=w_{000}-w_{001}-w_{110}+w_{111}\\ l&=w_{010}-w_{011}-w_{100}+w_{101}\\ \end{align*} $$

This is known as Beerenwinkel-Pachter-Sturmfels approach,

which provides a complete picture of interactions!

BUT

the approach is

based on the availability of fitness measurements
computationally feasible for up to four (?) loci

Problem: What if no (credible) fitness measurements are available?

Like in this malaria drug resistance data set:
Ogbunugafor et al. Malar. J. 2016

Why is fitness hard to measure?

Wikipedia

Results at a glance

We provide a complete characterization of partial fitness orders that imply circuit interaction (think epistasis).
Partial fitness orders arise in competition-like experiments and include:
- Rank orders
- Mutation fitness graphs

Inferring Genetic Interactions From Comparative Fitness Data, bioRxiv, 2017

Rank orders. The simplest case.

$\small u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$

Characterization of epistatic rank orders

Theorem 1. Consider a biallelic $n$-locus system. The number of rank orders which imply $n$-way epistasis is: \[ \frac{(2^n)! \times 2}{2^{n-1}+1} \]

Corollary. The fraction of rank orders that imply $n$-way epistasis among all rank orders is: \[ \frac{2}{2^{n-1}+1} \]

Mutation graph

Connection between rank orders and mutation graphs

Theorem 2. Let $f$ be a linear form with integer coefficients. Assume that the sum of the coefficients of $f$ is zero. Then a rank order implies that $f$ is not zero if and only if the rank order is mapped to a Dyck word by $\varphi^f$.

Theorem 3. A partial order implies $f$-interaction exactly if all its total extensions do.

Call a map $g: O \to E$ monotonic with respect to a partial order $\prec$ if $x \prec g(x)$ for all $x$, where $O \sqcup E$ is a partition of $\prec$.

Conjecture 4. A partial order $\prec$ implies positive (negative) interaction iff there exists a monotonic wrt $\prec$ bijection from $O$ to $E$ (from $E$ to $O$).

Applications

HIV-1
Antibiotic resistance
Gut microbiome (with Will Ludington, UC Berkeley)
Synthetic lethality
Knockdown cell lines

Methodologically, this allows us to advise further measurements (experiments) for incomplete data sets, thus reducing the number of potential experiments significantly.

Example: antibiotic resistance

Mira et al. PLOS ONE, 2015

Example: antibiotic resistance

Results in more detail

Efficient methods for:

Circuit interaction inference (including epistasis and three-way interaction) for total orders
Complete analysis of partial orders (including mutation graphs) with "distance to interaction" inference
Suggestions for possible completions in case of missing data and/or high uncertainty

Software (pre-release stage):
https://github.com/gavruskin/fitlands

Want to learn more?

We've got you covered!

All my talks (including this one) are at
http://alex.gavruskin.com/talks
Preprints are at
http://alex.gavruskin.com/publications
Software (and manuscripts in-progress) here:
https://github.com/gavruskin

Genetic interactions and fitness landscapes
Alex Gavryushkin

Wrightian fitness

Epistasis

Understanding three-way interactions

Algebraic Geometry sorts out the mess!

This is known as Beerenwinkel-Pachter-Sturmfels approach,

BUT

Problem: What if no (credible) fitness measurements are available?

Why is fitness hard to measure?

Results at a glance

Rank orders. The simplest case.

Characterization of epistatic rank orders

Mutation graph

Connection between rank orders and mutation graphs

Applications

Example: antibiotic resistance

Example: antibiotic resistance

Example: antibiotic resistance

Example: antibiotic resistance

Results in more detail

Want to learn more?

Funding

Thanks for your attention!

and stay tuned

Genetic interactions and fitness landscapes Alex Gavryushkin

Wrightian fitness

Epistasis

Understanding three-way interactions

Algebraic Geometry sorts out the mess!

This is known as Beerenwinkel-Pachter-Sturmfels approach,

BUT

Problem: What if no (credible) fitness measurements are available?

Why is fitness hard to measure?

Results at a glance

Rank orders. The simplest case.

Characterization of epistatic rank orders

Mutation graph

Connection between rank orders and mutation graphs

Applications

Example: antibiotic resistance

Example: antibiotic resistance

Example: antibiotic resistance

Example: antibiotic resistance

Results in more detail

Want to learn more?

Funding

Thanks for your attention! and stay tuned

Genetic interactions and fitness landscapes
Alex Gavryushkin

Thanks for your attention!

and stay tuned