Gene interactions:
a geometric approach
Alex Gavryushkin
Joint work with:
 Kristina Crona, American U., Washington, DC, USA
 Bernd Sturmfels, U. of California, Berkeley, USA
 Ewa Szczurek, U. of Warsaw, Poland
 Niko Beerenwinkel, ETH Zurich, Basel, Switzerland
October 19, 2016
Surprised I'm not talking about trees?
Don't worry.
Epistasis
Additive allelic effect:
$$
w_{11} + w_{00} = w_{01} + w_{10}
$$
Deviation from the additive expectation of allelic effects:
$u_{11} = w_{00} + w_{11}  (w_{01} + w_{10})$
Understanding threeway interactions
Marginal epistasis?
$u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$
Total threeway interaction?
$u_{111} = w_{000} + w_{011} + w_{101} + w_{110}  (w_{001} + w_{010} + w_{100} + w_{111})$
Conditional epistasis?
$e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$
Total mess!
(Algebraic) Geometry sorts out the mess!
$e = u_{011} + u_{111}$
In general, the four interaction coordinates
$$
u_{011}, u_{101}, u_{110}, u_{111}
$$
allow to describe all possible kinds of interaction!
There are 20 types of interaction and they are known as circuits to Algebraic Geometry 111 students
Yep, we've got the list!
$$
\scriptsize
\begin{align*}
a&= w_{000}w_{010}w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}w_{111}2w_{000}\\
b&=w_{001}w_{011}w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}w_{000}2w_{111}\\
c&=w_{000}w_{001}w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}w_{001}2w_{110}\\
d&=w_{010}w_{011}w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}w_{110}2w_{001}\\
e&=w_{000}w_{001}w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}w_{010}2w_{101}\\
f&=w_{100}w_{101}w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}w_{101}2w_{010}\\
g&=w_{000}w_{011}w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}w_{011}2w_{100}\\
h&=w_{001}w_{010}w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}w_{100}2w_{011}\\
i&=w_{000}w_{010}w_{101}+w_{111}\\
j&=w_{001}w_{011}w_{100}+w_{110}\\
k&=w_{000}w_{001}w_{110}+w_{111}\\
l&=w_{010}w_{011}w_{100}+w_{101}\\
\end{align*}
$$
$$
\scriptsize
\begin{align*}
a&= w_{000}w_{010}w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}w_{111}2w_{000}\\
b&=w_{001}w_{011}w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}w_{000}2w_{111}\\
c&=w_{000}w_{001}w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}w_{001}2w_{110}\\
d&=w_{010}w_{011}w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}w_{110}2w_{001}\\
e&=w_{000}w_{001}w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}w_{010}2w_{101}\\
f&=w_{100}w_{101}w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}w_{101}2w_{010}\\
\color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}w_{011}w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}w_{011}2w_{100}\\
h&=w_{001}w_{010}w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}w_{100}2w_{011}\\
i&=w_{000}w_{010}w_{101}+w_{111}\\
j&=w_{001}w_{011}w_{100}+w_{110}\\
k&=w_{000}w_{001}w_{110}+w_{111}\\
l&=w_{010}w_{011}w_{100}+w_{101}\\
\end{align*}
$$
This is known as BeerenwinkelPachterSturmfels approach,
which provides a complete picture of interactions!
BUT
the approach is
Hence, we come to two research questions
Problem 1: What if no (credible) fitness measurements are available?
Like in this malaria drug resistance data set:
Ogbunugafor et al. Malar. J. 2016
Results at a glance
 We provide a complete characterization of fitness graphs that imply circuit interaction (think epistasis).
 Fitness graphs arise in competitionlike experiments and include:
 Rank orders
 Mutation graphs
(Preprint(s) with Crona, Beerenwinkel, and others to appear)
Rank orders. The simplest case.
Characterization of epistatic rank orders
Theorem. Consider a biallelic $n$locus system.
The number of rank orders which imply $n$way epistasis is:
\[
\frac{(2^n)! \times 2}{2^{n1}+1}
\]
Corollary. The fraction of rank orders that imply nway epistasis
among all rank orders is:
\[
\frac{2}{2^{n1}+1}
\]
Mutation graph
Connection between rank orders and mutation graphs
Applications
 HIV1

Antibiotic resistance
 Fungi
 Gut microbiome (with Will Ludington, UC Berkeley)
 Synthetic lethality
 Knockdown cell lines
Methodologically, this allows us to advise further measurements (experiments) for incomplete data sets. This exponentially reduce the number of potential experiments (see the HIV example).
Example: antibiotic resistance
Mira et al. PLOS ONE, 2015
Example: antibiotic resistance
Example: antibiotic resistance
Example: antibiotic resistance
Mutation graph
Mutation graph
Mutation graph
Mutation graph
Mutation graph
Results in more detail
Efficient methods for:
 Circuit interaction inference (including epistasis and threeway interaction) for total orders
 Complete analysis of partial orders (including mutation graphs) with "distance to interaction" inference
 Suggestions for possible completions in case of missing data and/or high uncertainty
Software (prerelease stage):
https://github.com/gavruskin/fitlands
Problem 2: What if the number of genes (loci) is 20,822?
 2^20820 of conditional epistases?
 2^20820 measurements to estimate marginal epistasis?
Not in this life
Concrete example: genomewide RNAi perturbation screens
20,822 genes, 90,000 "trials" (siRNA's)
Two ways out
 Ask biologists about "the most important genes" that are main fitness drivers (like we did in the HIV study)
 Add statistical assumptions, for example:
(Ongoing work with Schmich, Szczurek, Beerenwinkel, et al.)
Thanks for your attention!
and stay tuned