ML = Machine Learning
CoSt = Computational Statistics
AS = Applied Statistics
It's not about applying old methods to new problems
Abraham Wald
R. Siegmund-Schultze (2003). Military Work in Mathematics 1914–1945: An Attempt at an International Perspective. Mathematics and War, edited by Bernhelm Booß-Bavnbek and Jens Høyrup, 23–82. DOI: 10.1007/978-3-0348-8093-0_2
2. Molecular biology: crash course
Does this matter?
Genetics diseases, e.g. cancer
Antibiotic resistance
Virus outbreaks
Food
Ecology
Information storage?
. . .
Why is it suddenly a thing?
Why biological data science?
Because the skills are highly transferable:
Nosy data
Visualisation
Communication
High-performance computing
Formalising biology
Let $G$ be the set of binary (quaternary in reality) strings of length $n$.
Elements of $G$ are called genotypes.
A function $w: G \to \mathbb R^+$ is called a fitness landscape.
For $g \in G$, $w(g)$ is called fitness of the genotype $g$.
Reality
Scalability: synthetic lethal pairs
How many pairs of genes are there in the human genome?
Gene-gene interactions
Epistasis
is defined as the deviation from the additive expectation of allelic effects:
$$u_{11} = w_{00} + w_{11} - (w_{01} + w_{10})$$
Theorem. A partial order (e.g. fitness graph) implies epistasis if and only if all linear extensions compatible with the partial order do.
Mutation graph
Mutation graph
Mutation graph
Mutation graph
Mutation graph
Lienkaemper, Lamberti, Drain, Beerenwinkel, and Gavryushkin. The geometry of partial fitness orders and an efficient method for detecting genetic interactions. Journal of Mathematical Biology, 2018.
4. Online algorithms to improve computational performance
The traditional way is to make the algorithm more efficient
When the same algorithm has to be re-run routinely, we can economise by making the algorithm slower and doing more!
This approach is known as online
Online algorithms
Online algorithms
Online algorithms
Online algorithms
Online algorithms
Online algorithms
Online algorithms
Online algorithms
Online algorithms
Homework:
Find an efficient online algorithm to detect genetic interactions from