Algorithms in Data Science

Alex Gavryushkin

2023 Semester 2

Table of contents

Literature, resources

Biological Data Science Lab @UCNZ

GitHub: @bioDS
Twitter: @bioDS_lab

Part 0: Introduction

ML = Machine Learning
CoSt = Computational Statistics
AS = Applied Statistics

Garey and Johnson. Computers and Intractability. 1979

Why biological data science?

Because the skills are highly transferable:

  • Scale
  • Heterogeneity
  • Visualisation and communication
  • High-performance computing

Why is it suddenly a thing?

Lecture 1: Background

Finite state automata


Pumping Lemma

Lecture 2: Turing machines

A Turing machine, $M$, consists of:
  • A finite set $\Gamma$ called the alphabet (or tape symbols)
  • A finite set $Q$ of states
  • A distinguished state $q_0 \in Q$ called the starting state
  • A subset $F \subseteq Q$ of halting states
  • A distinguished symbol $\beta \in \Gamma$ called the blank symbol (or the empty cell)
  • A partial function $F: (Q \setminus F) \times \Gamma \to Q \times \Gamma \times \{L, R, \varnothing\}$
    called the programme
The programme of a Turing machine $M$ terminates if $M$ reaches a state $q \in F$.
The program does not terminate otherwise.

Hence a non-terminating programme either hangs (infinite loop) or crashes (reaches a state and tape symbol with no command to execute).

We will be starting Turing machines on $$ \beta\underset{\Delta}{a_1} a_2 \ldots a_n \beta $$

$$ \begin{align} \mbox{Example 1} &\\ & q_0 \beta \to \mathrm{accept} \beta \end{align} $$

$$ \begin{align} \mbox{Example 2} &\\ & q_0 \beta \to \mathrm{accept} \beta\\ & q_0 0 \to q_0 \beta R\\ & q_0 1 \to q_0 \beta R \end{align} $$

$$ \begin{align} \mbox{Example 3} &\\ & q_0 \beta \to \mathrm{accept} \beta\\ & q_0 0 \to q_0 1 R\\ & q_0 1 \to q_0 0 R \end{align} $$

A Turing Machine $M$ computes a function $f: \mathbb N \to \mathbb N$ if
for all $x$ and $y$: $$ f(x) = y $$ $$ \Updownarrow $$ $$ \beta \underbrace{1 \ldots 1}_{x \mbox{ times}} \beta \Rightarrow_M \beta \underbrace{1 \ldots 1}_{y \mbox{ times}} \beta \mbox{ and $M$ halts} $$

We denote the latter by $M(x) = y$.
We write $M(x)\downarrow$ for $M$ halts on $\beta \underbrace{1 \ldots 1}_{x \mbox{ times}} \beta$

A Turing Machine $M$ computes a function $f: \mathbb N^2 \to \mathbb N$ if
for all $x_1, x_2$, and $y$: $$ f(x_1, x_2) = y $$ $$ \Updownarrow $$ $$ \beta \underbrace{1 \ldots 1}_{x_1 \mbox{ times}} 0 \underbrace{1 \ldots 1}_{x_2 \mbox{ times}} \beta \Rightarrow_M \beta \underbrace{1 \ldots 1}_{y \mbox{ times}} \beta \mbox{ and $M$ halts} $$ Similarly for more than two variables $x_1, x_2, \ldots, x_n$.
Example 1: compute the following functions. $$ \begin{align} & f(x) = 0\\ & g(x) = x + 1\\ & h(x, y) = x + y \end{align} $$

Example 2: implement the following module COPY. $$ \beta \underbrace{1 \ldots 1}_{x \mbox{ times}} \beta \Rightarrow_M \beta \underbrace{1 \ldots 1}_{x \mbox{ times}} 0 \underbrace{1 \ldots 1}_{x \mbox{ times}} \beta $$

Example 3: compute: $$ f(x, y) = xy $$

Example 4: recognise the following languages.

$a^* b^*$

$\{a^n b^n \mid n \in \mathbb N\}$

Church–Turing thesis

A problem can be solved by an algorithm if and only if it can be solved by a Turing machine.

Universal Turing Machine

Theorem: There exists a Turing Machine $U$ which computes all computable functions.
That is, for every computable function $f(x)$ there exists a number $s$ such that $U(s, x) = y$ if and only if $f(x) = y$.

Turing machine

Lecture 3: Universal Turing machine

Theorem: There exists a universal Turing Machine $U$.

That is, for every Turing Machine $M(\bar x)$ there exists a number $s$ such that $U(s, \bar x) = y$ if and only if $M(\bar x) = y$.

Idea 1: $s$ is $M$'s "number"
Idea 2: $U$ "decodes" $s$ into $M$'s program
Idea 3: $U$ "simulates" $M$'s execution on $\bar x$

Trick 1: Recursive functions

Trick 2: Multi-tape Turing machines

Lecture 4: Recursive functions, halting problem

Primitive recursive functions

Partial recursive functions

Recursive function that is not primitive recursive

Halting problem

Theorem: Not everything is computable.
For example, the following function is not computable $$ h(x) = \begin{cases} U(x, x) + 1, & \mbox{ if } U(x, x) \downarrow\\ 0, & \mbox{ otherwise} \end{cases} $$

Lecture 5: Kolmogorov complexity


Kolmogorov complexity $K(\bar x)$ of string $\bar x$ is $$\min {\{|M_s| ~\bigg |~ U(s, 0) = \bar x\}}$$ where $U$ is a universal Turing machine.


The definition is invariant under the programming language, that is, $K_U(\bar x) \leq K_A (\bar x) + c_A$ for every algorithm $A$.

Connection of KC to modern LLMs by Ilya Sutskever (OpenAI).

Given two languages $A$ and $B$ and a compression algorithm $M$, we say that $M$ is good if $$|M(A, B)| \leq |M(A)| + |M(B)| + c$$

With the idea of (unsupervised) model pre-training in mind, what is the best compression algorithm?

Kolmogorov complexity as the ultimate goal for compressors

And neural network training is effectively a search for such compressor.

Lecture 6: Properties of Kolmogorov complexity


Kolmogorov complexity $K(x)$ of a string $x$ is $$\min {\{|M_s| ~\bigg |~ U(s, 0) = x\}}$$ where $U$ is a universal Turing machine.


The following problems cannot be solved by an algorithm:
  1. Algorithm triviality
  2. Algorithm equivalence
  3. Kolmogorov complexity


Properties of $K(x)$:
  1. For all $x, K(x) \leq |x| + c$
  2. $K(xx) = K(x) + c$
  3. $K(1^n) \leq O(\log n)$
  4. $K(1^{n!}) \leq O(\log n)$
  5. $K(xy) \leq K(x) + K(y) + O(\log(\min\{K(x), K(y)\}))$ (compare with the definition of "good compression algorithm")

Big O notation recap

Chapters 2.1 and 2.2 in Roughgarden's Algorithms Illuminated Part 1.

Conditional Kolmogorov complexity: connection to pre-training

$K(x|y)$ is the length of the smallest TM that generates $x$ given $y$.


  • $K(x|\varnothing) = K(x)$
  • $K(x|x) = c$

## Lecture 7: Basic algorithms recap
### Why Study Algorithms? - Important for all other branches of computer science - Routing protocols in communication networks piggyback on classical shortest path algorithms - Public-key cryptography relies on efficient number-theoretic algorithms - Computer graphics requires the computational primitives supplied by geometric algorithms - Database indices rely on balanced search tree data structures - Computational biology uses dynamic programming algorithms to measure genome similarity
### Why Study Algorithms? - Driver of technological innovation - $\mathtt{PageRank}$ - From the President’s council of advisers on science and technology report to the United States White House, December 2010: *"Everyone knows Moore’s Law — a prediction made in 1965 by Intel co-founder Gordon Moore that the density of transistors in integrated circuits would continue to double every 1 to 2 years [...] in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed."*
### Why Study Algorithms? - Lens on other sciences - Good for the brain - Fun!
### Sorting #### Standard algorithms - $\mathtt{MergeSort}$ - $\mathtt{QuickSort}$ - [Sorting Networks](


Running time of $\mathtt{MergeSort}$

Theorem 1.2 For every input array of length $n \geq 1$, the $\mathtt{MergeSort}$ algorithm performs at most $$ 6n\log_2 n + 6n $$ operations, where $log_2$ denotes the base-2 logarithm.
### What is a "fast" algorithm? **Classically:** *A "fast algorithm" is an algorithm whose worst-case running time grows slowly with the input size.* [Tim Roughgarden] Possible alternative: *We define an algorithm to be any function that can be expressed with a short program.* [Wojciech Zaremba] Either way, near-linear algorithms are "for-free primitives" that we can just run.
## Lecture 8: Sorting

The divide-and-conquer paradigm

  1. Divide the input int smaller subproblems
  2. Conquer the subproblems recursively
  3. Combine the solution for the subproblems into a solution for the original problem
Often, the clever bit is the combine step (e.g. the $\mathtt{Merge}$ subroutine in $\mathtt{MergeSort}$), but in $\mathtt{QuickSort}$, for example, the clever step is divide.
#### **Problem: Counting Inversions** - **Input:** An array $A$ of distinct integers. - **Output:** The number of inversions of $A$ — the number of pairs $(i, j)$ of array indices with $i < j$ and $A[i] > A[j]$. What is the largest possible number of inversions a 7-element array can have?
## $\mathtt{QuickSort}$

Lecture X: How to read papers

Genome phasing

Schwarz, Roland F., Anne Trinh, Botond Sipos, James D. Brenton, Nick Goldman, and Florian Markowetz. 2014. “Phylogenetic Quantification of Intra-Tumour Heterogeneity.” PLoS Computational Biology 10 (4): e1003535.

Schwarz, Roland F., Anne Trinh, Botond Sipos, James D. Brenton, Nick Goldman, and Florian Markowetz. 2014. “Phylogenetic Quantification of Intra-Tumour Heterogeneity.” PLoS Computational Biology 10 (4): e1003535.