Gene interactions:
a geometric approach
Alex Gavryushkin
Joint work with:
- Kristina Crona, American U., Washington, DC, USA
- Bernd Sturmfels, U. of California, Berkeley, USA
- Ewa Szczurek, U. of Warsaw, Poland
- Niko Beerenwinkel, ETH Zurich, Basel, Switzerland
October 19, 2016
Surprised I'm not talking about trees?
Don't worry.
Epistasis
Additive allelic effect:
$$
w_{11} + w_{00} = w_{01} + w_{10}
$$
Deviation from the additive expectation of allelic effects:
$u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$
Understanding three-way interactions
Marginal epistasis?
$u_{\color{blue}{0}11} = w_{\color{blue}{0}00} + w_{\color{blue}{1}00} + w_{\color{blue}{0}11} + w_{\color{blue}{1}11} − (w_{\color{blue}{0}01} + w_{\color{blue}{1}01}) − (w_{\color{blue}{0}10} + w_{\color{blue}{1}10})$
Total three-way interaction?
$u_{111} = w_{000} + w_{011} + w_{101} + w_{110} - (w_{001} + w_{010} + w_{100} + w_{111})$
Conditional epistasis?
$e = w_{\color{blue}{0}00} − w_{\color{blue}{0}01} − w_{\color{blue}{0}10} + w_{\color{blue}{0}11}$
Total mess!
(Algebraic) Geometry sorts out the mess!
$e = u_{011} + u_{111}$
In general, the four interaction coordinates
$$
u_{011}, u_{101}, u_{110}, u_{111}
$$
allow to describe all possible kinds of interaction!
There are 20 types of interaction and they are known as circuits to Algebraic Geometry 111 students
Yep, we've got the list!
$$
\scriptsize
\begin{align*}
a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\
b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\
c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\
d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\
e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\
f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\
g&=w_{000}-w_{011}-w_{100}+w_{111} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\
h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\
i&=w_{000}-w_{010}-w_{101}+w_{111}\\
j&=w_{001}-w_{011}-w_{100}+w_{110}\\
k&=w_{000}-w_{001}-w_{110}+w_{111}\\
l&=w_{010}-w_{011}-w_{100}+w_{101}\\
\end{align*}
$$
$$
\scriptsize
\begin{align*}
a&= w_{000}-w_{010}-w_{100}+w_{110} & m&=w_{001}+w_{010}+w_{100}-w_{111}-2w_{000}\\
b&=w_{001}-w_{011}-w_{101}+w_{111} & n&=w_{011}+w_{101}+w_{110}-w_{000}-2w_{111}\\
c&=w_{000}-w_{001}-w_{100}+w_{101} & o&=w_{010}+w_{100}+w_{111}-w_{001}-2w_{110}\\
d&=w_{010}-w_{011}-w_{110}+w_{111} & p&=w_{000}+w_{011}+w_{101}-w_{110}-2w_{001}\\
e&=w_{000}-w_{001}-w_{010}+w_{011} & q&=w_{001}+w_{100}+ w_{111}-w_{010}-2w_{101}\\
f&=w_{100}-w_{101}-w_{110}+w_{111} & r&=w_{000}+w_{011}+ w_{110}-w_{101}-2w_{010}\\
\color{blue}{g}&\hskip{2pt}\color{blue}{=w_{000}-w_{011}-w_{100}+w_{111}} & s&=w_{000}+w_{101}+ w_{110}-w_{011}-2w_{100}\\
h&=w_{001}-w_{010}-w_{101}+w_{110} & t&=w_{001}+w_{010}+w_{111}-w_{100}-2w_{011}\\
i&=w_{000}-w_{010}-w_{101}+w_{111}\\
j&=w_{001}-w_{011}-w_{100}+w_{110}\\
k&=w_{000}-w_{001}-w_{110}+w_{111}\\
l&=w_{010}-w_{011}-w_{100}+w_{101}\\
\end{align*}
$$
This is known as Beerenwinkel-Pachter-Sturmfels approach,
which provides a complete picture of interactions!
BUT
the approach is
Hence, we come to two research questions
Problem 1: What if no (credible) fitness measurements are available?
Like in this malaria drug resistance data set:
Ogbunugafor et al. Malar. J. 2016
Results at a glance
- We provide a complete characterization of fitness graphs that imply circuit interaction (think epistasis).
- Fitness graphs arise in competition-like experiments and include:
- Rank orders
- Mutation graphs
(Preprint(s) with Crona, Beerenwinkel, and others to appear)
Rank orders. The simplest case.
Characterization of epistatic rank orders
Theorem. Consider a biallelic $n$-locus system.
The number of rank orders which imply $n$-way epistasis is:
\[
\frac{(2^n)! \times 2}{2^{n-1}+1}
\]
Corollary. The fraction of rank orders that imply n-way epistasis
among all rank orders is:
\[
\frac{2}{2^{n-1}+1}
\]
Mutation graph
Connection between rank orders and mutation graphs
Applications
- HIV-1
-
Antibiotic resistance
- Fungi
- Gut microbiome (with Will Ludington, UC Berkeley)
- Synthetic lethality
- Knockdown cell lines
Methodologically, this allows us to advise further measurements (experiments) for incomplete data sets. This exponentially reduce the number of potential experiments (see the HIV example).
Example: antibiotic resistance
Mira et al. PLOS ONE, 2015
Example: antibiotic resistance
Example: antibiotic resistance
Example: antibiotic resistance
Mutation graph
Mutation graph
Mutation graph
Mutation graph
Mutation graph
Results in more detail
Efficient methods for:
- Circuit interaction inference (including epistasis and three-way interaction) for total orders
- Complete analysis of partial orders (including mutation graphs) with "distance to interaction" inference
- Suggestions for possible completions in case of missing data and/or high uncertainty
Software (pre-release stage):
https://github.com/gavruskin/fitlands
Problem 2: What if the number of genes (loci) is 20,822?
- 2^20820 of conditional epistases?
- 2^20820 measurements to estimate marginal epistasis?
Not in this life
Concrete example: genome-wide RNAi perturbation screens
20,822 genes, 90,000 "trials" (siRNA's)
Two ways out
- Ask biologists about "the most important genes" that are main fitness drivers (like we did in the HIV study)
- Add statistical assumptions, for example:
(Ongoing work with Schmich, Szczurek, Beerenwinkel, et al.)
Thanks for your attention!
and stay tuned