Gene interactions:
a geometric approach

Alex Gavryushkin

Joint work with:
  • Kristina Crona, American U., Washington, DC, USA
  • Bernd Sturmfels, U. of California, Berkeley, USA
  • Ewa Szczurek, U. of Warsaw, Poland
  • Niko Beerenwinkel, ETH Zurich, Basel, Switzerland


December 1, 2016

Throughout, we consider $n$ biallelic loci, for various values of $n$

Notation

For a genotype $g$, $w_g$ denotes the fitness of $g$,
for example, $w_{11}$ is the fitness of the double mutant.

Epistasis   $u_{11}$

Additive allelic effect   $=$   no epistasis:

$$ w_{11} + w_{00} = w_{01} + w_{10} $$

Deviation from the additive expectation of allelic effects:

$u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$

Understanding three-way interactions

Marginal epistasis?

$\small u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$

Total three-way interaction?

$\small u_{111} = w_{000} + w_{011} + w_{101} + w_{110} - (w_{001} + w_{010} + w_{100} + w_{111})$

Conditional epistasis?

$\small e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$

Total mess!

(Algebraic) Geometry sorts out the mess!

$e = u_{011} + u_{111}$

In general, the four interaction coordinates $$ u_{011}, u_{101}, u_{110}, u_{111} $$ allow to describe all possible kinds of interaction!

There are 20 types of interaction and they are known as circuits to Algebraic Geometry 111 students

Yep, we've got the list!

$$ \scriptsize \begin{align*} a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\ b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\ c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\ d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\ e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\ f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\ g&=w_{000}-w_{011}-w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\ h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\ i&=w_{000}-w_{010}-w_{101}+w_{111}\\ j&=w_{001}-w_{011}-w_{100}+w_{110}\\ k&=w_{000}-w_{001}-w_{110}+w_{111}\\ l&=w_{010}-w_{011}-w_{100}+w_{101}\\ \end{align*} $$
$$ \scriptsize \begin{align*} a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\ b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\ c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\ d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\ e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\ f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\ \color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}-w_{011}-w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\ h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\ i&=w_{000}-w_{010}-w_{101}+w_{111}\\ j&=w_{001}-w_{011}-w_{100}+w_{110}\\ k&=w_{000}-w_{001}-w_{110}+w_{111}\\ l&=w_{010}-w_{011}-w_{100}+w_{101}\\ \end{align*} $$

This is known as Beerenwinkel-Pachter-Sturmfels approach,

which provides a complete picture of interactions!

BUT

the approach is

  • based on the availability of fitness measurements

  • computationally feasible for up to four loci

Hence, we come to two research questions

Problem 1: What if no (credible) fitness measurements are available?

Like in this malaria drug resistance data set:
Ogbunugafor et al. Malar. J. 2016

Results at a glance

  • We provide a complete characterization of fitness graphs that imply circuit interaction (think epistasis).
  • Fitness graphs arise in competition-like experiments and include:
    • Rank orders
    • Mutation graphs


(Preprint(s) with Crona, Beerenwinkel, and others to appear)

Rank orders. The simplest case.

$\small u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$

Characterization of epistatic rank orders

Theorem. Consider a biallelic $n$-locus system. The number of rank orders which imply $n$-way epistasis is: \[ \frac{(2^n)! \times 2}{2^{n-1}+1} \]

Corollary. The fraction of rank orders that imply $n$-way epistasis among all rank orders is: \[ \frac{2}{2^{n-1}+1} \]

Mutation graph

Connection between rank orders and mutation graphs

Applications

  • HIV-1

  • Antibiotic resistance

  • Gut microbiome (with Will Ludington, UC Berkeley)

  • Synthetic lethality

  • Knockdown cell lines

Methodologically, this allows us to advise further measurements (experiments) for incomplete data sets, thus reducing the number of potential experiments significantly.

Example: antibiotic resistance

Mira et al. PLOS ONE, 2015

Example: antibiotic resistance

Example: antibiotic resistance

Example: antibiotic resistance

Mutation graph

Mutation graph

Mutation graph

Mutation graph

Mutation graph

Results in more detail

Efficient methods for:
  • Circuit interaction inference (including epistasis and three-way interaction) for total orders
  • Complete analysis of partial orders (including mutation graphs) with "distance to interaction" inference
  • Suggestions for possible completions in case of missing data and/or high uncertainty


Software (pre-release stage):
https://github.com/gavruskin/fitlands

Problem 2: What if the number of genes (loci) is 20,822?

  • 2^20822 of conditional epistases?
  • 2^20822 measurements to estimate marginal epistasis?

Not in this life

Concrete example: genome-wide RNAi perturbation screens

20,822 genes, 90,000 "trials" (siRNA's)

RNAi perturbation screen

Two ways out

  1. Isolate a small number of "interesting" genes, e.g. main fitness drivers (like we did in the HIV study)
  2. Add statistical assumptions, for example:
    • Ignore higher-order interactions

    • Structural hypotheses: "It rarely make sense to have interactions without main effects"—Lim and Hastie


(Ongoing work with Schmich, Szczurek, Beerenwinkel, et al.)

Want to learn more?

We've got you covered!

Thanks for your attention!


and stay tuned