2023 Semester 2

GitHub: @bioDS

### Part 0: Introduction

ML = Machine Learning
CoSt = Computational Statistics
AS = Applied Statistics

Garey and Johnson. Computers and Intractability. 1979

### Why biological data science?

Because the skills are highly transferable:

• Scale
• Heterogeneity
• Visualisation and communication
• High-performance computing

## Lecture 2: Turing machines

A Turing machine, $M$, consists of:
• A finite set $\Gamma$ called the alphabet (or tape symbols)
• A finite set $Q$ of states
• A distinguished state $q_0 \in Q$ called the starting state
• A subset $F \subseteq Q$ of halting states
• A distinguished symbol $\beta \in \Gamma$ called the blank symbol (or the empty cell)
• A partial function $F: (Q \setminus F) \times \Gamma \to Q \times \Gamma \times \{L, R, \varnothing\}$
called the programme
The programme of a Turing machine $M$ terminates if $M$ reaches a state $q \in F$.
The program does not terminate otherwise.

Hence a non-terminating programme either hangs (infinite loop) or crashes (reaches a state and tape symbol with no command to execute).

We will be starting Turing machines on $$\beta\underset{\Delta}{a_1} a_2 \ldots a_n \beta$$

\begin{align} \mbox{Example 1} &\\ & q_0 \beta \to \mathrm{accept} \beta \end{align}

\begin{align} \mbox{Example 2} &\\ & q_0 \beta \to \mathrm{accept} \beta\\ & q_0 0 \to q_0 \beta R\\ & q_0 1 \to q_0 \beta R \end{align}

\begin{align} \mbox{Example 3} &\\ & q_0 \beta \to \mathrm{accept} \beta\\ & q_0 0 \to q_0 1 R\\ & q_0 1 \to q_0 0 R \end{align}

A Turing Machine $M$ computes a function $f: \mathbb N \to \mathbb N$ if
for all $x$ and $y$: $$f(x) = y$$ $$\Updownarrow$$ $$\beta \underbrace{1 \ldots 1}_{x \mbox{ times}} \beta \Rightarrow_M \beta \underbrace{1 \ldots 1}_{y \mbox{ times}} \beta \mbox{ and M halts}$$

We denote the latter by $M(x) = y$.
We write $M(x)\downarrow$ for $M$ halts on $\beta \underbrace{1 \ldots 1}_{x \mbox{ times}} \beta$

A Turing Machine $M$ computes a function $f: \mathbb N^2 \to \mathbb N$ if
for all $x_1, x_2$, and $y$: $$f(x_1, x_2) = y$$ $$\Updownarrow$$ $$\beta \underbrace{1 \ldots 1}_{x_1 \mbox{ times}} 0 \underbrace{1 \ldots 1}_{x_2 \mbox{ times}} \beta \Rightarrow_M \beta \underbrace{1 \ldots 1}_{y \mbox{ times}} \beta \mbox{ and M halts}$$ Similarly for more than two variables $x_1, x_2, \ldots, x_n$.
Example 1: compute the following functions. \begin{align} & f(x) = 0\\ & g(x) = x + 1\\ & h(x, y) = x + y \end{align}

Example 2: implement the following module COPY. $$\beta \underbrace{1 \ldots 1}_{x \mbox{ times}} \beta \Rightarrow_M \beta \underbrace{1 \ldots 1}_{x \mbox{ times}} 0 \underbrace{1 \ldots 1}_{x \mbox{ times}} \beta$$

Example 3: compute: $$f(x, y) = xy$$

Example 4: recognise the following languages.

$a^* b^*$

$\{a^n b^n \mid n \in \mathbb N\}$

### Church–Turing thesis

A problem can be solved by an algorithm if and only if it can be solved by a Turing machine.

### Universal Turing Machine

Theorem: There exists a Turing Machine $U$ which computes all computable functions.
That is, for every computable function $f(x)$ there exists a number $s$ such that $U(s, x) = y$ if and only if $f(x) = y$.

## Lecture 3: Universal Turing machine

### Theorem: There exists a universal Turing Machine $U$.

That is, for every Turing Machine $M(\bar x)$ there exists a number $s$ such that $U(s, \bar x) = y$ if and only if $M(\bar x) = y$.

Proof
Idea 1: $s$ is $M$'s "number"
Idea 2: $U$ "decodes" $s$ into $M$'s program
Idea 3: $U$ "simulates" $M$'s execution on $\bar x$

## Lecture 4: Recursive functions, halting problem

### Halting problem

Theorem: Not everything is computable.
For example, the following function is not computable $$h(x) = \begin{cases} U(x, x) + 1, & \mbox{ if } U(x, x) \downarrow\\ 0, & \mbox{ otherwise} \end{cases}$$

## Lecture 5: Kolmogorov complexity

https://learn.canterbury.ac.nz/course/view.php?id=19467

### Definition

Kolmogorov complexity $K(\bar x)$ of string $\bar x$ is $$\min {\{|M_s| ~\bigg |~ U(s, 0) = \bar x\}}$$ where $U$ is a universal Turing machine.

### Theorem

The definition is invariant under the programming language, that is, $K_U(\bar x) \leq K_A (\bar x) + c_A$ for every algorithm $A$.

Connection of KC to modern LLMs by Ilya Sutskever (OpenAI).

Given two languages $A$ and $B$ and a compression algorithm $M$, we say that $M$ is good if $$|M(A, B)| \leq |M(A)| + |M(B)| + c$$

With the idea of (unsupervised) model pre-training in mind, what is the best compression algorithm?

### Kolmogorov complexity as the ultimate goal for compressors

And neural network training is effectively a search for such compressor.

## Lecture 6: Properties of Kolmogorov complexity

### Definition

Kolmogorov complexity $K(x)$ of a string $x$ is $$\min {\{|M_s| ~\bigg |~ U(s, 0) = x\}}$$ where $U$ is a universal Turing machine.

### Theorem

The following problems cannot be solved by an algorithm:
1. Algorithm triviality
2. Algorithm equivalence
3. Kolmogorov complexity

### Theorem

Properties of $K(x)$:
1. For all $x, K(x) \leq |x| + c$
2. $K(xx) = K(x) + c$
3. $K(1^n) \leq O(\log n)$
4. $K(1^{n!}) \leq O(\log n)$
5. $K(xy) \leq K(x) + K(y) + O(\log(\min\{K(x), K(y)\}))$ (compare with the definition of "good compression algorithm")

### Big O notation recap

Chapters 2.1 and 2.2 in Roughgarden's Algorithms Illuminated Part 1.

### Conditional Kolmogorov complexity: connection to pre-training

$K(x|y)$ is the length of the smallest TM that generates $x$ given $y$.

Properties:

• $K(x|\varnothing) = K(x)$
• $K(x|x) = c$

## Lecture 7: Basic algorithms recap
### Why Study Algorithms? - Important for all other branches of computer science - Routing protocols in communication networks piggyback on classical shortest path algorithms - Public-key cryptography relies on efficient number-theoretic algorithms - Computer graphics requires the computational primitives supplied by geometric algorithms - Database indices rely on balanced search tree data structures - Computational biology uses dynamic programming algorithms to measure genome similarity
### Why Study Algorithms? - Driver of technological innovation - $\mathtt{PageRank}$ - From the President’s council of advisers on science and technology report to the United States White House, December 2010: *"Everyone knows Moore’s Law — a prediction made in 1965 by Intel co-founder Gordon Moore that the density of transistors in integrated circuits would continue to double every 1 to 2 years [...] in many areas, performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed."*
### Why Study Algorithms? - Lens on other sciences - Good for the brain - Fun!
### Sorting #### Standard algorithms - $\mathtt{MergeSort}$ - $\mathtt{QuickSort}$ - [Sorting Networks](https://paperpile.com/app/p/c4f00e6c-9658-0c59-8301-c2fd300b9584)

### Running time of $\mathtt{MergeSort}$

Theorem 1.2 For every input array of length $n \geq 1$, the $\mathtt{MergeSort}$ algorithm performs at most $$6n\log_2 n + 6n$$ operations, where $log_2$ denotes the base-2 logarithm.
### What is a "fast" algorithm? **Classically:** *A "fast algorithm" is an algorithm whose worst-case running time grows slowly with the input size.* [Tim Roughgarden] Possible alternative: *We define an algorithm to be any function that can be expressed with a short program.* [Wojciech Zaremba] Either way, near-linear algorithms are "for-free primitives" that we can just run.
## Lecture 8: Sorting

1. Divide the input int smaller subproblems
2. Conquer the subproblems recursively
3. Combine the solution for the subproblems into a solution for the original problem
Often, the clever bit is the combine step (e.g. the $\mathtt{Merge}$ subroutine in $\mathtt{MergeSort}$), but in $\mathtt{QuickSort}$, for example, the clever step is divide.
#### **Problem: Counting Inversions** - **Input:** An array $A$ of distinct integers. - **Output:** The number of inversions of $A$ — the number of pairs $(i, j)$ of array indices with $i < j$ and $A[i] > A[j]$. What is the largest possible number of inversions a 7-element array can have?
$\mathtt{Merge}$-$\mathtt{and}$-$\mathtt{CountSplitInv}$
## $\mathtt{QuickSort}$

## Lecture X: How to read papers

### Genome phasing

Schwarz, Roland F., Anne Trinh, Botond Sipos, James D. Brenton, Nick Goldman, and Florian Markowetz. 2014. “Phylogenetic Quantification of Intra-Tumour Heterogeneity.” PLoS Computational Biology 10 (4): e1003535.

Schwarz, Roland F., Anne Trinh, Botond Sipos, James D. Brenton, Nick Goldman, and Florian Markowetz. 2014. “Phylogenetic Quantification of Intra-Tumour Heterogeneity.” PLoS Computational Biology 10 (4): e1003535.