The geometry and combinatorics of phylogenetic tree spaces
Alex Gavryushkin
16 October 2018
What this talk is NOT about
Phylogenetic inference
Image source: Wikipedia
Tumour evolution
Jahn, Katharina, Jack Kuipers, and Niko Beerenwinkel. "Tree inference for singlecell data." Genome biology 17.1 (2016): 86.
Timing tumour evolution
Lote, H., I. Spiteri, L. Ermini, A. Vatsiou, A. Roy, A. McDonald, N. Maka, et al. 2017. "Carbon Dating Cancer: Defining the Chronology of Metastatic Progression in Colorectal Cancer." Annals of Oncology 28 (6): 1243–49.
 Gavryushkin, Alex, and Alexei J. Drummond. "The space of ultrametric phylogenetic trees."
Journal of Theoretical Biology 403 (2016): 197208.
 Stadler, Tanja, Timothy G. Vaughan, Alex Gavryushkin, Stephane Guindon, Denise Kühnert, Gabriel E. Leventhal, Alexei J. Drummond.
"How well can the exponentialgrowth coalescent approximate constantrate birthdeath population dynamics?"
Proc. R. Soc. B. Vol. 282. No. 1806. The Royal Society, 2015.
tauspace
Geodesic in timetree space
 Gavryushkin, Alex, and Alexei J. Drummond. "The space of ultrametric phylogenetic trees."
Journal of Theoretical Biology 403 (2016): 197208.
 Gavryushkin, Alex, and Alexei Drummond. tauGeodesic. Mar. 2015. doi: 10.5281/zenodo.47152.
https://github.com/gavruskin/tauGeodesic
 Gavryushkina, Alexandra, Tracy A. Heath, Daniel T. Ksepka, Tanja Stadler,
David Welch, and Alexei J. Drummond.
"Bayesian totalevidence dating reveals the recent crown radiation of penguins."
Systematic Biology 66.1 (2016): 5773.
 Gavryushkin, Alex, and Alexei Drummond. tauGeodesic. Mar. 2015. doi: 10.5281/zenodo.47152.
https://github.com/gavruskin/tauGeodesic
tspace: simplex
tspace: 2D
tspace: 3D
(Discrete) Timertree
Sampled ancestor tree
NNI graph
Why is this important?
 Tree search algorithms
 Model testing/selection and other simulation studies
Why is this hard?
Trees are many!
Discrete timetree space
Sampled ancestor tree space
Trees at distance 2
Trees at distance 4
Main idea
(my failed proof)
History of the NNI graph
 Over 25 year of work!
 Over 7 erroneous papers published!
What's wrong with the NNI graph?

The Split Theorem

The merge and sort trick
Merge and sort trick
Merge and sort trick
RNNI is free from all these!
What is an approximate (?) algorithm?
Graph grammars
A graph grammar consists of a finite set of productions $\{L_i \to_i R_i\}$, where $L_i$ and $R_i$ are connected undirected edgeend labeled graphs and $\to_i$ is a onetoone map between halfedges of $L_i$ and those of $R_i$.
The productions are then applied to the starting tree $T$ to derive all possible trees at $\mathrm{RNNI}$ distance up to $r$ from $T$.
A production is said to be ready at a stage $s$ of the derivation if the running tree at stage $s$ has a subgraph isomorphic to the left side $L_i$ of the production.
Graph grammar: example
Theorem (Sleator, Tarjan, and Thurston). Let $\mathcal{G}$ be a graph grammar and $G$ a graph on $n$ vertices.
Then the number of trees at distance at most $r$ from $G$ is $(c+1)^{n+rm}$, where $c$ is the number of vertices in left sides of $\mathcal{G}$ and $m$ the maximum number of vertices in any right side of $\mathcal{G}$.
Theorem (Sleator, Tarjan, and Thurston). Let $\mathcal{G}$ be a graph grammar and $G$ a graph on $n$ vertices.
Then the number of trees at distance $r$ from $G$ is $(c+1)^{n+rm}$.
Theorem (Gavryushkin, Whidden, and Matsen). The number of trees within distance $r$ from any given tree is at most
$3^{n+2r−1}$ in $\mathrm{RNNIu}$
$3^{2n+2r−1}$ in $\mathrm{RNNI}$
$4^{n+2r−1}$ in $\mathrm{DtT}$
$4^{2n+2r−1}$ in $\mathrm{DtTu}$
Corollary (Gavryushkin, Whidden, and Matsen).
$$\frac{1}{2} \log_3 \frac{(n1)!n!}{6^{n1}} \leq \mathrm{Diam}(\mathrm{RNNIu}) \leq n^2  3n  \frac{5}{8}$$
Proof:
What we've done?

Introduced the RNNI graph on ranked trees (to the best of our knowledge)

Established basic geometric properties of the graph

Designed an efficient approximate algorithm for computing shortest paths

Proved that all the fancy NNI methods, e.g. SleatorTarjanThurston and mergeandsort argument, don't work

Failed to prove that RNNI is NPhard
What has to be done?
 Is RNNI polynomial? Complexity?
 Split Theorem
 Are these two related?
 ...