24 April 2018

### We consider biallelic "genotypes" of length $n$

Example: $n = 6$

Throughout, we consider $n$ biallelic loci, for different $n$.

That is, the set of genotypes is $\mathcal G = \{0,1\}^{n}$.

A fitness landscape is a function $w:\mathcal G \to \mathbb R^+$.

For $g \in \mathcal G$,  $w(g)$ is called the fitness of genotype $g$ and denoted $w_g$.

### Epistasis

is defined as the deviation from the additive expectation of allelic effects: $$u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$$

### Understanding three-way interactions

#### Marginal epistasis?

$\small u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$

#### Total three-way interaction?

$\small u_{111} = w_{000} + w_{011} + w_{101} + w_{110} - (w_{001} + w_{010} + w_{100} + w_{111})$

#### Conditional epistasis?

$\small e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$

### Interaction classification

$e = \displaystyle\frac{u_{011} + u_{111}}{2}$

In general, the four interaction coordinates $$u_{011}, u_{101}, u_{110}, u_{111}$$ allow to describe all possible kinds of interaction!

There are 20 types of "minimal" interactions and they are known as circuits

Yep, we've got the list!

\scriptsize \begin{align*} a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\ b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\ c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\ d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\ e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\ f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\ g&=w_{000}-w_{011}-w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\ h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\ i&=w_{000}-w_{010}-w_{101}+w_{111}\\ j&=w_{001}-w_{011}-w_{100}+w_{110}\\ k&=w_{000}-w_{001}-w_{110}+w_{111}\\ l&=w_{010}-w_{011}-w_{100}+w_{101}\\ \end{align*}
\scriptsize \begin{align*} a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\ b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\ c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\ d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\ e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\ f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\ \color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}-w_{011}-w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\ h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\ i&=w_{000}-w_{010}-w_{101}+w_{111}\\ j&=w_{001}-w_{011}-w_{100}+w_{110}\\ k&=w_{000}-w_{001}-w_{110}+w_{111}\\ l&=w_{010}-w_{011}-w_{100}+w_{101}\\ \end{align*}

### This is known as Beerenwinkel-Pachter-Sturmfels approach,

which provides a complete picture of interactions!

### BUT

the approach is

• based on the availability of fitness measurements

• computationally feasible for up to four loci

### Hence, we come to two research questions

Image: Wikipedia

#### Mutation fitness graph

Ogbunugafor et al. Malar. J. 2016

#### Rank orders. The simplest case.

$\small u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$

### Exercise: Dyck word algorithm

\begin{align} \small u_{011} =~ & w_{000} + w_{100} + w_{011} + w_{111} − \\ & w_{001} - w_{101} − w_{010} - w_{110} \end{align}

$$w_{111} > w_{011} > w_{101} > w_{010} > w_{000} > w_{110} > w_{100} > w_{001}$$

$$w_{111} > w_{011} > w_{100} > w_{000} > w_{001} > w_{101} > w_{010} > w_{110}$$

A way to quantify uncertainties!

### Applications

• HIV-1

• Antibiotic resistance

• Gut microbiome (with Will Ludington, UC Berkeley)

• Synthetic lethality

• Knockdown cell lines

Methodologically, this allows us to advise further measurements (experiments) for incomplete data sets, thus reducing the number of potential experiments significantly.

#### Example: antibiotic resistance

Mira et al. PLOS ONE, 2015

### Results in more detail

Efficient methods for:
• Circuit interaction inference (including epistasis and three-way interaction) for total orders
• Complete analysis of partial orders (including mutation graphs) with "distance to interaction" inference
• Suggestions for possible completions in case of missing data and/or high uncertainty

Software (pre-release stage):
https://github.com/gavruskin/fitlands

### Problem 2: What if the number of genes (loci) is 20,822?

• 2^20822 of conditional epistases?
• 2^20822 measurements to estimate marginal epistasis?

Not in this life

### Concrete example: genome-wide RNAi perturbation screens

20,822 genes, 90,000 "trials" (siRNA's)

### Two ways out

1. Isolate a small number of "interesting" genes, e.g. main fitness drivers (like we did in the HIV study)
2. Add statistical assumptions, for example:
• Ignore higher-order interactions

• Structural hypotheses: "It rarely make sense to have interactions without main effects"—Lim and Hastie

(Ongoing work with Schmich, Szczurek, Beerenwinkel, et al.)

We've got you covered!

#### Acknowledgements

• You
• Niko Beerenwinkel, ETH Zürich
• Bernd Sturmfels, Max Planck Institute Leipzig
• Kristina Crona, American University
• Devin Greene, American University
• Lisa Lamberti, ETH Zürich
• Caitlin Lienkaemper, Penn State

and stay tuned