16 October 2018

### Phylogenetic inference

Image source: Wikipedia

### Tumour evolution

Jahn, Katharina, Jack Kuipers, and Niko Beerenwinkel. "Tree inference for single-cell data." Genome biology 17.1 (2016): 86.

### Timing tumour evolution

Lote, H., I. Spiteri, L. Ermini, A. Vatsiou, A. Roy, A. McDonald, N. Maka, et al. 2017. "Carbon Dating Cancer: Defining the Chronology of Metastatic Progression in Colorectal Cancer." Annals of Oncology 28 (6): 1243–49.

• Gavryushkin, Alex, and Alexei J. Drummond. "The space of ultrametric phylogenetic trees." Journal of Theoretical Biology 403 (2016): 197-208.
• Stadler, Tanja, Timothy G. Vaughan, Alex Gavryushkin, Stephane Guindon, Denise Kühnert, Gabriel E. Leventhal, Alexei J. Drummond. "How well can the exponential-growth coalescent approximate constant-rate birth-death population dynamics?" Proc. R. Soc. B. Vol. 282. No. 1806. The Royal Society, 2015.

#### Geodesic in time-tree space

• Gavryushkin, Alex, and Alexei J. Drummond. "The space of ultrametric phylogenetic trees." Journal of Theoretical Biology 403 (2016): 197-208.
• Gavryushkin, Alex, and Alexei Drummond. tauGeodesic. Mar. 2015. doi: 10.5281/zenodo.47152. https://github.com/gavruskin/tauGeodesic
• Gavryushkina, Alexandra, Tracy A. Heath, Daniel T. Ksepka, Tanja Stadler, David Welch, and Alexei J. Drummond. "Bayesian total-evidence dating reveals the recent crown radiation of penguins." Systematic Biology 66.1 (2016): 57-73.
• Gavryushkin, Alex, and Alexei Drummond. tauGeodesic. Mar. 2015. doi: 10.5281/zenodo.47152. https://github.com/gavruskin/tauGeodesic

### Why is this important?

• Tree search algorithms
• Model testing/selection and other simulation studies

Trees are many!

### Main idea

(my failed proof)

### History of the NNI graph

• Over 25 year of work!
• Over 7 erroneous papers published!

### What's wrong with the NNI graph?

1. The Split Theorem

2. The merge and sort trick

### RNNI is free from all these!

• Split theorem. Tick.
• Merge and sort doesn't work. Tick
• Efficient polynomial algorithm?

### Graph grammars

A graph grammar consists of a finite set of productions $\{L_i \to_i R_i\}$, where $L_i$ and $R_i$ are connected undirected edge-end labeled graphs and $\to_i$ is a one-to-one map between half-edges of $L_i$ and those of $R_i$.

The productions are then applied to the starting tree $T$ to derive all possible trees at $\mathrm{RNNI}$ distance up to $r$ from $T$.

A production is said to be ready at a stage $s$ of the derivation if the running tree at stage $s$ has a subgraph isomorphic to the left side $L_i$ of the production.

#### Graph grammar: example

Theorem (Sleator, Tarjan, and Thurston). Let $\mathcal{G}$ be a graph grammar and $G$ a graph on $n$ vertices. Then the number of trees at distance at most $r$ from $G$ is $(c+1)^{n+rm}$, where $c$ is the number of vertices in left sides of $\mathcal{G}$ and $m$ the maximum number of vertices in any right side of $\mathcal{G}$.
Theorem (Sleator, Tarjan, and Thurston). Let $\mathcal{G}$ be a graph grammar and $G$ a graph on $n$ vertices. Then the number of trees at distance $r$ from $G$ is $(c+1)^{n+rm}$.

Theorem (Gavryushkin, Whidden, and Matsen). The number of trees within distance $r$ from any given tree is at most
$3^{n+2r−1}$ in $\mathrm{RNNIu}$         $3^{2n+2r−1}$ in $\mathrm{RNNI}$
$4^{n+2r−1}$ in $\mathrm{DtT}$             $4^{2n+2r−1}$ in $\mathrm{DtTu}$

Corollary (Gavryushkin, Whidden, and Matsen). $$\frac{1}{2} \log_3 \frac{(n-1)!n!}{6^{n-1}} \leq \mathrm{Diam}(\mathrm{RNNIu}) \leq n^2 - 3n - \frac{5}{8}$$

### What we've done?

• Introduced the RNNI graph on ranked trees (to the best of our knowledge)

• Established basic geometric properties of the graph

• Designed an efficient approximate algorithm for computing shortest paths

• Proved that all the fancy NNI methods, e.g. Sleator-Tarjan-Thurston and merge-and-sort argument, don't work

• Failed to prove that RNNI is NP-hard

### What has to be done?

• Is RNNI polynomial? Complexity?
• Split Theorem
• Are these two related?
• ...