Is it hard to update trees when data change?
Online phylogenomics
Alex Gavryushkin
24 February 2020
"Static" algorithms
Data -> Algorithm -> Solution
Basically equivalent to computing a function $f(x_1, \ldots, x_n) = y$
Online algorithms
Given a finite set of "requests" $R = \{r_i\}$, we want to compute a series of functions
$$
r_0(), r_1(r_0), r_2(r_0, r_1),\ldots, r_{i+1}(r_0,\ldots, r_i),\ldots
$$
The (worst case) complexity (denoted by $c$) of an online algorithm is then defined as (a function of $n$)
$$
\frac 1 n \max_{r_i \in R}\sum_{i = 0}^{n-1} c(r_i)
$$
When online algorithms make sense
It is convenient to say that a problem has an efficient online solution if its online complexity is smaller than the "static" complexity of the worst $r_i$.
The central question for us in this project is what methods in computational biology have efficient online solutions.
Hint: Greedy methods are good candidates
When online algorithms are possible
Stability analysis
We say that an algorithm is (Lyapunov) stable if small perturbations to the input (data) result in small changes to the solution.
It is very hard to find efficient online versions of non-stable static algorithms.
Online Phylogenetics
Tree inference methods
Distance-based methods
Model-based methods
Treespace analysis methods
Distance-based methods
Path-based methods
Your ideas?
Distance-based tree inference methods
Neighbour-joining, UPGMA.
Possible requests: add taxon, remove taxon, change distance (sequence).
Stability considered with respect to the distance matrix
Model-based tree inference methods
Large space of possible requests.
Computing distances and paths in treespace
Paths are not stable in the BHV and tau-space but they could be those rare exceptions and have online versions.
Open question: t-space.
Local rearrangement based distances, such as RNNI, are stable for appropriate requests.
So these are our focus now.
Applications (future work)
Consensus methods: Fréchet mean (centroid) based methods are stable if paths between trees (geodesics) are.
Tree proposals: (our recent algorithms) FindPath, MDistance.
Analyses of samples of trees, e.g. Rob Lanfear's proxy for ESS using focal trees (plus our FindPath algorithm).
Acknowledgements
Royal Society of New Zealand — Rutherford Discovery Fellowships; Catalyst Fund — funding
New Zealand Ministry of Business, Innovation, and Employment — Endeavour Fund; Strategic Science Investment Fund (Data Science programmes) — funding
bioDS lab and CS dept @Otago — support and helpful discussions
You — coming to my talk and listening :)