# Algorithms and data structures

# Preface

## Prerequisites

You need to know Discrete mathematics.

## Learning ethics

# Introduction

## What is an Algorithm?

Informally, an algorithm is a series of computational steps that takes an input and returns an output in a finite amount of time. The input and output relationship is a way to describe a computational problem. So, you use algorithms because you have a problem, but no upside down.

A mathematical reader can think is a little floppy definition and maybe you thought “Your definition is not the correct way to find any science” —what the heck is a computational step?— and you are right, it is an easy way to definite an algorithm, but it is not the full answer. A more accurate definition will be given to you later.

Different ways to describe an algorithm:

```
# Notation 1
f: x: T -> y: T1 -> z: T2 ... -> output: TN
f = computational steps
# Notation 2
f(x: T, y: T1, ...): output: TN = computational steps
# Notation 3
f(x: T, y: T1, ...): output: TN {
computational steps
}
```

Some authors don’t write types when they write algorithms, but we do since they give precision.

We say with $x:T$ that x is a term with type T. A type has values and operations —interface. We explain more later.

## Why do algorithms matter to you?

Perhaps you are here to build a Skynet —Artificial intelligence, to solve the problem NP=P —Research, work in a Big Tech company —Industry.

Algorithms are foundations for building great systems and for research too.

Quality of algorithms.

### Performance

## Research

## Ecosystem

Standards, jobs, industry, roles, …

ACM. Special Interest Group on Algorithms and

Computation Theory.

http://www.sigact.org/

### How do you practice the Design of Algorithms?

Euler Project

IMO

Codeforces

LeetCode

HackerRank

## Story

## Notation

$n \in \mathbb{N}$ represents **items** for the algorithm given. This implies I either use $n=ceil(X)$or $n=floor(X)$ by a satisfies the problem presented.

$log_2(n)=lg(n)$

$f$ is faster than $g$, if $f$ is smaller than $g$ i.e. $f<g$

## Classification

## FAQ

- Some exercises are Jupyter

## Worked examples

### Is an algorithm a type?

No. They are terms.

## Summarize

## Mindmap

`@startmindmap * Introduction to Algorithms ** Objective: Study the fundamentals of algorithms and their analyses ** Contents *** Algorithms **** Properties **** Efficiency *** Sorting algorithms **** Bubble sort **** Insertion sort **** Merge sort **** Quick sort *** Asymptotic notation **** Big-Oh notation **** Omega notation **** Theta notation *** Recurrences **** Substitution method **** Recursion-tree method **** Master method *** Summary **** Algorithms solve computational problems **** Analyzing algorithms helps understand their properties and efficiency **** Key concepts: algorithms, efficiency, asymptotic notation, recurrences @endmindmap`

## Reference Notes

Definition of algorithm. (2022, December 04). Retrieved from https://www.merriam-webster.com/dictionary/algorithm

# Theory of algorithms

Note that our definition means algorithms have to finish but exists other classes of computational methods can or can not be finished.

TODO: Mathematical definition.

Computational method.

Types.

Algorithm.

Computational problem.

If you write an algorithm it has to work for each valid input instance.

## Models of computation.

There are several models of computation such as Lambda calculus, or Turing Machine, but in fact, they are equivalent (we are not proving that statement here, you can explore them in Models of Computation). We are focusing on Von Neuman Languages and Random Access Machines, which are the current paradigm in computing. It means we describe algorithms with assignments and side effects, but by Turing-Church thesis, you can write your algorithms in a functional programming style.

Random Access Machine Emulator (by Simon Gottlieb and Simon Koennecke). (2014, October 24). Retrieved from https://farbtrommel.de/ram

Functional Algorithm Design, Part 0. (2020, November 15). Retrieved from https://blog.sigplan.org/2020/11/17/functional-algorithm-design-part-0

If you are interested in learning algorithms based on Lambda Calculus, you should read some of the following books

Stone, J. D. (2018). *Algorithms for Functional Programming*. Springer.

Bird, R., & Gibbons, J. (2020). *Algorithm Design with Haskell*. Cambridge University Press.

RUSSELL, D. (2003). Algorithms: A Functional Programming Approach by Fethi Rabhi and Guy Lapalme, Addison-Wesley, 1999, ISBN 0-201-59604-0, xi 235pp. *Journal of Functional Programming,* *13*(4), 828-829. doi:10.1017/S0956796803244877

## Representations and codification.

You can change your representation by encoding the way the underlying machine can do operations. So, you could write an algorithm in English and then change the representation to hardware circuit specification. For example, 3 can be represented as $III,$ 011, or Г. The last is the Cyrillic alphabet’s 3.

On the other hand, reduction is a technique to solve problem A with problem B choosing the right representation.

```
def algorithmA(input):
representation_b = encode_input_as_instance_of_problem_b(input)
return decode_input_as_instance_of_problem_a(algorithmB(representation_b))
```

For example,

## Language specification

A pseudocode is a way to learn algorithms, but somehow you should be capable of understanding it.

```
f(n: NaturalNumber): NaturalNumber
sum = 0
for i in 0 <= i <= n where i++
sum += i
return sum
```

# Data structures and File structures

Data structures live in primary memory.

Specification.

## Immutability and Mutability. State.

In place.

## Sequence

Extrinsic order

**Array**

Array in C

**Linkedin list**

**Dynamic Array**

List in Python

**Sequence AVL**

## Databases

## Set

Build, Get and put operations.

### Exact set membership

Intrinsic query set.

### Approximate set membership

The election of hyperparameters is a problem of optimization.

Bloom filters

Counting Bloom filters

d-left counting Bloom filters

Cuckoo filters

XOR Filters

Spatial Coupling

Binary Fuse Filters

## Map

A map or dictionary is an ADT that relations keys to values. JSON, YAML, and XML expand the concept to allow different languages to interoperate. There are different ways to implement dictionaries such as Hash Table, Red-Black Tree, AVL.

### Data structures

**Array**

**Dynamic Array**

**Direct Access**

**Hash Table**

**Set AVL**

**Bits**

Encode a set as bits. It is a faster data structure since you are changing your algorithm to a clever representation! Every operation works $T(n)=O(1)$.

```
class Set:
set = 0
GOAL = 100
hasReachGoal():
return set == GOAL
remove(element: int)
return set ^= (1 << element)
has(element: int)
return (set & (1 << element)) == (1 << element)
insert(element: int)
return set |= 1 << element
```

Segment tree

Heaps

Tries (Suffix tree)

**Disjoint-set data structure**

Dictionaries

Files, storage, and data structures

**Nearest neighbor search**

## Summary

## Worked examples

```
// Each bit in our set is 2^n, where n is a integer.
// bin(2^0)=000001, bin(2^1)=000010, bin(2^2)=000100, ...
// 1 << n is equivalent to 2^n but faster
"insert" operation example
000000 // set = 0
000100 // n=2, bin(1 << n)=bin(2^2)=000100
000100 // set | 000100
"has" operation example
000101 // set = 5
000100 // n=2, bin(1 << n)=bin(2^2)=000100
000100 // (set & 000100) == 000100
"remove" operation example
000101 // set = 5
000100 // n=2, bin(1 << n)=bin(2^2)=000100
000001 // set ^ 000100
"goal"
000 // Suppose you want a full 3-bit vector
// So
1000 // bin(1 << 3)=bin(2^3)=8
111 // bin((1<<3)-1)
```

# Analysis of algorithms

## Efficiency

Algorithm efficiency is more significant than differences due to hardware. Why should learn about algorithms? Applications use them either to solve larger problems than ever before or rely heavily upon algorithms.

Some authors write efficiency and complexity interchangeably, we prefer to refer to our analysis as time efficiency and space efficiency, but you can search for them as Time complexity and Space complexity.

Informally, We have the running time like $T: n \to time$ and $S: n \to space$ where $n$ is the size of the input. We usually compare algorithms based on their time and their results. The resources required by an algorithm on the instance size and instance representation.

Requirements for resource analysis.

Independence. The analysis must be independent of the hardware of a computer, the programming language used for pseudocode, and the programmer who wrote the code.

A priori.

Large instance requirement.

Growth rate classes

Roughgarden, T. (2021). Beyond the Worst-Case Analysis of Algorithms. Cambridge University Press.

## Time analysis

* Algorithmic Analysis

** Asymptotic Notation

** Recurrence Relations

** Amortized Analysis

** Probabilistic Analysis

Best case. The smallest amount of resources is needed to run any instance.

Worst case. The largest amount of resources is needed to run any instance.

Average case. The largest amount of resources is needed to run any instance.

Time T(n) steps | Time in S | |

## Correctness

### Correct and incorrect

The algorithm is correct if it can solve the given problem. An incorrect algorithm may halt with a partial or no solution. That algorithm could be useful if we control their error rate. However, most time we focused on the correct algorithms.

### Formal verification and reasoning about algorithms

**Proving Programs Correct**

Your algorithm needs a way to prove if it is correct, a well-known way to verify formally your algorithms is the Hoare logic —including loop invariant and Weakest Precondition Calculus—. Indeed, it is a way to reason about your algorithm too.

Algorithms have some conditions that should be true to say an algorithm is correct such that the input would be correct, and the same as the output. They are predicates, the first kind is a precondition and the last one is a postcondition. You can have whatever number you need.

You can think of preconditions as a formal definition of the input problem, and postconditions as a formal definition of the output problem.

Another way to think about preconditions and postconditions is like you are designing a contract where your "interface client" (who will call your interface) and a "supplier" (your interface) agree on a "contract" that defines obligations (preconditions that definite the correct way to use your interface) to receive benefits (postconditions or output).

Typing your function is a way to definite them, but sometimes types are not enough to express the “contract”.

Depending on your algorithm, you have also to consider security.

Assert.

Invariant loop.

Hoare Logic, loop invariants, and weakest preconditions by Programming: The Derivation of Algorithms by Anne Kaldewaij.

https://allan-blanchard.fr/publis/frama-c-wp-tutorial-en.pdf

Defensive programming

This is a topic in software engineering, defensive programming

Design by contract

TODO: Does Hoare logic only work with imperative algorithms?

## Amortized Analysis

https://www.youtube.com/watch?v=kNIcbsyu4zI&ab_channel=ConfEngine

## Case analysis

Worst, Best, Average analysis.

## Warning on Empirical Analysis —Testing and Benchmarking

## Profiling and benchmarking

### References

https://docs.python.org/3/library/functools.html

## Exercises

- Logic. Mathematics. Code. Automatic Verification such as Lean Proven or Frama-C.

- Languages in Anki.

## Projects

## Summary

## FAQ

### How can we analyze algorithms with standard input or standard output? Network? or generally speaking input-output calls?

In this knowledge unit —Algorithms, we work with algorithms that don’t have to block calls such as network calls or standard input output calls, but if you have to measure the efficiency of an algorithm in such a situation it counts as constant time because it doesn’t depend on any input instance.

For example:

```
f(n: ℕ): void
for i in 0 <= i <= n with i++
fetch("some network call {n}") # It is constant time
```

The algorithm time efficiency is $\Omega(n)$ since you are reducing your problem to call to another process.

But, you should know network calls and other I/O calls are the most time-consuming.

TODO

## Do the constant factors matter?

Yes, in practice, constant factors matter.

https://www.youtube.com/watch?v=s0YSKiFdj8Q

## Reference Notes

Definition of efficiency versus complexity of the algorithm. (2022, December 14). Retrieved from https://cs.stackexchange.com/questions/71858/definition-of-efficiency-versus-complexity-of-algorithm

## Worked examples

## Mr. Anderson writes down a number between 1 and 1023. Ms. Hacker must identify that number by asking “yes/no” questions of Mr. Anderson. Ms. Hacker knows that Mr. Anderson always tells the truth. If Mr. Anderson uses an optimal strategy, then she will determine the answer at the end of exactly how many questions in the worst case.

$\lfloor log_2(1023)+1 \rfloor = 10$

# Framework Thinking about the design and analysis of algorithms (Getting started)

## Sum of sequence

**Input: ** A non-null sequence of $n$ real numbers $[a_1,a_2,a_3,...,a_n]$.

**Output**: A real value $r$, such that $r=\sum_i^n a_i$

```
SUM(A)
n = A.length c1(1)
r = A[n] c2(1)
for j = n-1 downto 1 c3(n)
do r = r + A[j] c4(n-1)
return r c5(1)
```

**Invariant loop. **At the start of each iteration of the for loop of lines 3-4, $r$ is equal to elements from $n$ to last values $n-j$.

## Initialization.

Before the first loop iteration, $j=n-1$ and $r=a_n$, therefore $r$ equals the last value, $n-(n-1)=1$. Which checks with the invariant loop.

## Maintenance.

Before the loop iteration, $j=k$, thus $r=a_n+a_{n-1}+a_{n-2}+...+a_{n-k}$ (the summation of last elements). At line $4$ $j=k+1$ element is added to $r$, i.e. the next loop iteration $r=a_n+a_{n-1}+a_{n-2}+...+a_{n-k}+a_{n-(k+1)}$. Which checks with the invariant loop.

## Termination.

When loop terminates $j=n$, $n-j=n-n=0$. By invariant loop We have $r=a_n+a_{n-1}+a_{n-2}+...+a_{n-k}+a_{n-(k+1)}+...+a_1$.

Hence, the algorithm is correct.

We sum the product of the costs times columns $T(n)=c_1+c_2+c_3n+c_4(n-1)+c_5n=(c_4+c_5)n+(c_1+c_2+c_3-c_4)$.

## Sorting problem by insertion sort

**Input**: A sequence (or array) of $n$ numbers $[a_1,a_2,...,a_n]$. The numbers that we wish to sort are also known as the keys.

**Output**: A reordering $[a_1^{'},a_2^{'},...,a_n^{'}]$ of the input sequence such that $a_1^{'}\le a_2^{'}\le ...\le a_n^{'}$.

```
// T should be support '>' or '<' operation
sort_by_insertion_sort_in_place(array: T[]): void
for j in 2 <= j <= array.n where j++
noSortedElement = array[j] // no sorted element
// variant array[1:j-1-] is sorted
i = search_the_right_position(array, j-1, noSortedElement)
// i in j > i >= 0 with i--, so we need i+1 by size array.
array[i+1] = noSortedElement // set a new mininum.
search_the_right_position(array: T[], i: j-1 >= i >= 0, noSortedElement)
// [2,3,1] 2 >= i >= 0 noSortedElement=1, here i==2
// [2,3,3] because array[2]=3 > noSortedElement=1, here i==1
// [2,2,3] because array[1]=2 > noSortedElement=1, here i==0
// we are done because i == 0
while i > 0 and array[i] > noSortedElement
array[i+1] = array[i]
i--
// i+1 is the right position for noSortedElement
return i
```

## Languages implementation

`def insertion_sort(numberSequence, compareFunction): for j in range(1,len(numberSequence)): key = numberSequence[j] i = j - 1 while i >= 0 and compareFunction(numberSequence[i], key): numberSequence[i+1] = numberSequence[i] i = i - 1 numberSequence[i+1] = key return numberSequence; insertion_sort([4,3,2,1], lambda a,b : a > b)`

`void insertionSort ( int numberSequence[ ] , int length) { for( int j = 1 ; j < length ; j++ ) { int key = numberSequence[ j ]; int i = j - 1; while( i >= 0 && numberSequence[i] > key) { numberSequence[i+1] = numberSequence[i]; i = i - 1; } numberSequence[ i+1 ] = key; } }`

## Worked examples

## Using Figure 2.2 as a model, illustrate the operation of INSERTION-SORT on the

array

$A=[31,41,59,26,41,58]$

## Rewrite the INSERTION-SORT procedure to sort into monotonically decreasing instead of monotonically increasing order.

`search_the_right_position(array: T[], i: j-1 >= i >= 0, noSortedElement) // We only change '>' to '<'. Invariant is the same. while i > 0 and array[i] < noSortedElement array[i+1] = array[i] i-- return i`

`insertion_sort(array, compare_as: a < b)`

## Consider the search problem:

Input: A sequence of n numbers A D ha1; a2;:::;ani and a value.

Output: An index i such that D AŒi or the special value NIL if does not

appear in A.

Write pseudocode for linear search, which scans through the sequence, looking

for . Using a loop invariant, prove that your algorithm is correct. Make sure that

your loop invariant fulfills the three necessary properties.

### Worked examples

## 1. Give an example of an application that requires algorithmic content at the application level, and discuss the function of the algorithms involved.

- Page Rank Algorithm by Google. Search web about Internet.

## 2. Suppose we are comparing implementations of insertion sort and merge sort on the same machine. For inputs of size n, insertion sort runs in $8n^2$ steps, while merge sort runs in $64nlg(n)$ steps. For which values of n does insertion sort

**beat**merge sort?$When \text{ is }8n^2 \text{ faster than } 64nlg(n)?$ That implies $\forall n \mid8n^2<64nlg(n)$

Note: If $f$ is a multivalued function, $f(2)=n_1$, $f(2)=n_2$ and $n_2>n_1$. We have $n<f(2)$ implies $n<n_2$ and $n_1<n$.

## 3. What is the smallest value of n such that an algorithm whose running time is $100n^2$ runs faster than an algorithm whose running time is $2^n$ on the same machine?

$When \text{ is }100n^2 \text{ faster than } 2^n continually?$ That implies $\text{ from } n_0 \text{ to } \infty \text{, }100n^2<2^n \text{ i.e. } sup(n) \mid 100n^2<2^n$.

**Problem 1-1**Comparison of running times

For each function f(n) and time t in the following table, determine the largest size n of a problem that can be solved in time t, assuming that the algorithm to solve the problem takes f(n) microseconds.eg.

If the size of a problem is $n=10^6$ and $f(n)=n$, the problem takes

$f(10^6)=10^6$ microseconds = $1$ second.

So, the problem is this statement:

Since the functions are monotone-increasing functions, $t$ increases when $n$, so $n$ is the largest size of the problem.

The problem needs the inverse function.

But we need to transform the domain into seconds, now $t$ are seconds, but $f^{-1}$ receives microseconds.

eg. $n=f^{-1}(1 \times 10^6)$ means the largest size $n$ of a problem in 1 second if $f^{-1}$ is a monotone-increasing function.

W(t) is the product log function or Lambert W.

We use Stirling's approximation to calculate the factorial inverse.

**Summary**

Exponential algorithms are bad!

## Code

`@startmindmap * Sorting and Order Statistics ** Objectives: Learn different sorting algorithms and their properties, and study order statistics ** Contents *** Heapsort **** Heap data structure **** Building a heap **** Heapsort algorithm **** Analysis and efficiency *** Quicksort **** Partitioning **** Quicksort algorithm **** Analysis and efficiency **** Randomized quicksort *** Sorting in linear time **** Counting sort **** Radix sort **** Bucket sort *** Medians and order statistics **** Selection problem **** Lower-bound for comparison-based sorting **** Randomized selection algorithm **** Order statistics tree *** Summary **** Sorting is a fundamental problem in computer science **** Different sorting algorithms have different properties and efficiencies **** Order statistics deal with finding the kth smallest element in a set **** Key concepts: heapsort, quicksort, counting sort, selection problem, order statistics @endmindmap`

# Fundamental Data Structures and Algorithms

## Hash functions

Hash functions transform an input of arbitrary size into an output of fixed size. It is useful to sign a file, and build a hash table, and checksum.

Find a hash function.

Non-cryptographic hash function and cryptographic hash function

MurmurHash

SipHash

CityHash

https://en.wikipedia.org/wiki/List_of_hash_functions#Non-cryptographic_hash_functions

SHA-256

## Hashing table

https://www.youtube.com/watch?v=DMtFhACPnTY

https://www.youtube.com/watch?v=KyUTuwz_b7Q

Coalesced Hashing

Cuckoo Hashing

Robin Hood Hashing

Hopscotch Hashing

2-Choice Hashing

2-Left Hashing

Pollution attacks and Collision attacks

## Topological sort

Where could you find the topological sort?

Makefile.

Dependency resolution.

Sentence Ordering

Critical Path Analysis

Course Schedule problem

Finding cycle in a graph. Compilers need to check cycles on inherency.

Operation System deadlock detection.

Accounting for Computer Scientists — Martin Kleppmann’s blog. (2023, March 16). Retrieved from https://martin.kleppmann.com/2011/03/07/accounting-for-computer-scientists.html

Wheeler, D. A. (2023, March 22). The Free-Libre / Open Source Software (FLOSS) License Slide. Retrieved from https://dwheeler.com/essays/floss-license-slide.html

### Cheat code

Wayne, Robert Sedgewick, and Kevin. "Algorithms and Data Structures Cheatsheet." 3 Feb. 2023, algs4.cs.princeton.edu/cheatsheet.

## Code

`@startmindmap * Data Structures ** Objectives: Understand the basic data structures and their properties, and analyze their efficiency in terms of space and time ** Contents *** Elementary data structures **** Arrays ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Linked lists ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Stacks ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Queues ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Dequee *** Hash tables **** Direct-address tables ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Hash tables with chaining ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Hash functions ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Analysis and efficiency ***** Space complexity ***** Time complexity *** Binary search trees **** Binary Tree (AKA Binary Search Tree or BST) ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Tree traversals ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Binary search tree property ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Operations on binary search trees ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Balanced binary search trees ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Analysis and efficiency ***** Space complexity ***** Time complexity *** Priority queues **** Heaps ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Priority queue operations ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Implementations ***** Definition ***** Operations ***** Space complexity ***** Time complexity **** Analysis and efficiency ***** Space complexity ***** Time complexity *** Summary **** Data structures organize and store data for efficient access and modification **** Efficiency in terms of space and time is important in designing and analyzing data structures **** Key concepts: arrays, linked lists, stacks, queues, hash tables, binary search trees, priority queues @endmindmap`

## Code

`@startmindmap skinparam shadowing false skinparam roundcorner 15 skinparam linetype ortho * Advanced Data Structures ** Bloom Filter ** LRU Cache ** Red-Black Tree ** B-Trees *** Definition and Properties *** Insertion and Deletion *** Searching ** Augmenting Data Structures *** Interval Trees *** Order Statistics Trees *** Van Emde Boas Trees *** Fibonacci Heaps *** Dynamic Tables *** Disjoint Set Union ** Data Structures for Disjoint Sets *** Disjoint Set Operations *** Linked-List Representation *** Union by Rank and Path Compression *** Analysis of Union by Rank and Path Compression ** Splay Trees *** Definition and Properties *** Rotations *** Operations ** Skip Lists *** Definition and Properties *** Searching *** Insertion and Deletion ** Hash Tables *** Direct Address Tables *** Hash Functions *** Collision Resolution *** Perfect Hashing ** Binary Heaps *** Definition and Properties *** Insertion and Deletion *** Heap Operations ** Treaps *** Definition and Properties *** Randomized Search Trees *** Insertion and Deletion ** Pairing Heaps *** Definition and Properties *** Merge and Insert Operations *** Delete-Min Operation @endmindmap`

## Code

`@startmindmap skinparam shadowing false skinparam roundcorner 15 skinparam linetype ortho * Graph Algorithms ** Basic Graph Algorithms *** Breadth-First Search (BFS) *** Depth-First Search (DFS) *** Topological Sort *** Connected Components *** Strongly Connected Components (SCC) ** Minimum Spanning Trees *** Kruskal's Algorithm *** Prim's Algorithm *** Reverse-Delete Algorithm ** Single-Source Shortest Paths *** Dijkstra's Algorithm *** Bellman-Ford Algorithm *** DAG Shortest Paths *** Johnson's Algorithm ** All-Pairs Shortest Paths *** Floyd-Warshall Algorithm *** Johnson's Algorithm ** Maximum Flow *** Ford-Fulkerson Algorithm *** Edmonds-Karp Algorithm *** Dinic's Algorithm ** Computational Geometry Algorithms *** Convex Hulls *** Line Segment Intersection *** Closest Pair ** Randomized Algorithms *** Randomized Minimum Cut *** Global Minimum Cut ** NP-Completeness *** Decision Problems and Search Problems *** Polynomial-Time Reductions *** Examples of NP-Complete Problems ** Algorithmic Analysis *** Asymptotic Notation *** Recurrence Relations *** Amortized Analysis *** Probabilistic Analysis refnote "Introduction to Algorithms" by Cormen, Leiserson, Rivest, and Stein (CLRS), "The Art of Computer Programming" by Donald Knuth" @endmindmap`

## Code

`@startmindmap skinparam shadowing false skinparam roundcorner 15 skinparam linetype ortho * Selected Topics ** Approximation Algorithms *** Vertex Cover *** Traveling Salesman Problem *** Knapsack Problem ** Linear Programming *** Simplex Algorithm *** Duality and Sensitivity Analysis *** Network Flow Formulations ** Number-Theoretic Algorithms *** Greatest Common Divisor *** Modular Arithmetic *** The Chinese Remainder Theorem ** String Matching *** The Naive Algorithm *** The Rabin-Karp Algorithm *** The Knuth-Morris-Pratt Algorithm ** Cryptography *** Public-Key Cryptography *** RSA Algorithm *** Cryptographic Hash Functions ** Parallel Algorithms *** PRAM Model *** Parallel Prefix Sum *** Matrix Multiplication ** Computational Complexity *** Time and Space Complexity *** The Class P and NP *** NP-Complete Problems @endmindmap`

## Code

`@startmindmap skinparam shadowing false skinparam roundcorner 15 skinparam linetype ortho * Design Techniques ** Divide and Conquer *** How? **** Recursively divide problem into smaller subproblems **** Solve subproblems independently **** Combine solutions into a single solution to the original problem *** Examples **** Mergesort: sort an array by dividing it into two halves, sorting the halves, and merging them together. [CLRS, Knuth] **** Quicksort: sort an array by partitioning it into two subarrays around a pivot element, and then recursively sorting the subarrays. [CLRS, Knuth] **** Strassen's Algorithm: multiply two matrices by recursively dividing them into smaller submatrices, computing intermediate matrices, and combining the results. [CLRS] **** The Closest Pair Problem: find the pair of points in a set that are closest together by recursively dividing the set and computing the closest pairs in each subset, and then combining the results. [CLRS] ** Dynamic Programming *** How? **** Divide problem into smaller subproblems **** Compute solutions to subproblems, storing intermediate results **** Combine solutions to subproblems into a solution to the original problem *** Examples **** Rod Cutting: find the maximum revenue from cutting a rod into pieces of different lengths with different values. [CLRS] **** Matrix-Chain Multiplication: find the optimal way to multiply a chain of matrices by dividing it into subproblems and combining the results. [CLRS] **** Longest Common Subsequence: find the longest common subsequence between two sequences by dividing the problem into subproblems and computing intermediate results. [CLRS, Knuth] **** Optimal Binary Search Trees: find the optimal binary search tree for a set of keys with given probabilities by computing subproblems and combining the results. [CLRS] ** Greedy Algorithms *** How? **** Make locally optimal choices **** Combine locally optimal choices into a global solution *** Examples **** Huffman Coding: compress a message by encoding frequently-occurring characters with shorter codes, using a prefix-free code. [CLRS, Knuth] **** Fractional Knapsack: fill a knapsack with a maximum weight by selecting items with the highest value per weight. [CLRS] **** Dijkstra's Algorithm: find the shortest path between two nodes in a graph by making locally optimal choices and updating distances. [CLRS, Knuth] **** Prim's Algorithm: find the minimum spanning tree of a graph by making locally optimal choices and adding edges. [CLRS] ** Randomized Algorithms *** How? **** Use randomness to solve problems **** Probability of success can be amplified **** Some problems can only be solved with randomness *** Examples **** Quicksort: randomize the choice of pivot element to avoid worst-case scenarios. [CLRS, Knuth] **** Las Vegas Algorithms: use randomness to guarantee a correct solution, but with uncertain running time. [CLRS] **** Monte Carlo Algorithms: use randomness to approximate a solution, with a known probability of error. [CLRS, Knuth] **** Skip Lists: use probabilistic skipping to create a data structure with logarithmic search time. [CLRS] @endmindmap`

**Sorting problem**

**Insertion sort**

**Selection sort**

…

## Searching

# Algorithm design

Problem-solving strategies. Patterns.

How to solve an algorithm problem? Reduce to a problem you already know or design your algorithm. These algorithm design techniques are the way to find a solution to discrete optimization problems.

## Brute-force algorithms

## Greedy algorithms

### Worked examples

16-1 Coin changing

Consider the problem of making change for n cents using the fewest number of coins. Assume that each coin’s value is an integer.

## a. Describe a greedy algorithm to make change consisting of quarters, dimes, nickels, and pennies —25, 10, 5 y 1 respectively. Prove that your algorithm yields an optimal solution.

**Input.****$n\in\mathbb{N}$****Output.**The fewest sequence of quarters, dimes, nickels, and pennies, such that their sum equals to $n$.**Corollary 1.****$quarters, dimes, nickels, pennies \in \mathbb{N}$**

## b. Suppose that the available coins are in the denominations that are powers of c, i.e., the denominations are $c^0,c^1,...,c^k$ for some integers c>1 and $k\ge1$.

Show that the greedy algorithm always yields an optimal solution.

## c. Give a set of coin denominations for which the greedy algorithm does not yield an optimal solution. Your set should include a penny so that there is a solution for every value of n.

## d. Give an $O(nk)$-time algorithm that makes change for any set of k different coin denominations, assuming that one of the coins is a penny.

## Linear programming

## Recursion algorithm design

## Divide-and-conquer

Given a problem of size $n$, divide it into a smaller subproblem $n/b$ $(b>1)$, solve each subproblem recursively, and combine the solutions.

$T(n)=aT(n/b)+f(n)$, $f(n)$ is the time of combination part. The most interesting part is the merge operation.

**Convex hull **is a 2D problem that needs divide-and-conquer to find a sequence of points on the boundary and same time it involves a bunch of points**. **It’s possible to extend the problem to more dimensions. Formally, the Convex hull problem is described as follows: given $n$ points in a plane $S=\{(x_i,y_i)| i=1,2,3...,n\}$, find the smallest convex polygon containing all points in $S$ represented by the sequence of points in order clockwise as a double linked list.

A brute force algorithm is scanning the bunch of points that gives $O(n^3)$.

```
CH_by_brute_force(S: Set):
Connect all points to form a segment // O(n^2)
Test each segment such that if the rest of points are on one side the segment
the segment on the convex hull // O(n^3)
```

A better approach is a divide-and-conquer algorithm as follows:

```
CH(S: Set ordered by x coordinates ASC):
divide S into left half A and right half B by x coords
Compute CH(A) and CH(B)
Combine CH of two halves (merge step)
```

```
merge(S: Set ordered by x coordinates ASC):
// https://jayjingyuliu.wordpress.com/2018/01/24/algorithm-two-finger-convex-hull-merging-algorithm/
```

**Median Finding. **Given a set of $n$ numbers, define $rank(x)$ as a number of numbers in the set that are less or equal than x, formally $rank(x)=|\{y|y \in S \land y \le x \}|$. Find element of rank $\lfloor \dfrac{n+1}{2} \rfloor$ and $\lceil \dfrac{n+1}{2} \rceil$.

```
select(S, i):
pick x in S
k = rank(x)
B = {y in S | y < x} // |B|=k-1
C = {y in S | y > x} // |C|=n-k
if k = i
return x
if k>i
return Select(B,i)
if k<i
return Select(C,i-k)
```

```
Arrrange S into columns of size 5 (n/5 cols)
Sort each column (bigger elements on top)
FInd "median of medians" as x
```

## Recursive backtracking

## Dynamic Programming

We define our problem as recursion, now we can apply memoization. It is about remembering and reusing subsolutions. If subproblems don't repeat, you cannot apply memoization. Dynamic programming works when you have an optimal substructure and overlapping subproblems.

- Optimal substructure: a globally optimal solution can be found by combining optimal solutions with local subproblems.

- Overlapping subproblems: finding an optimal solution involves solving the same problem multiple times.

Optimization.

Suproblem design (trick).

Given a sequence $x$ with $|x|=n$,

Good subproblems are prefixes ($x[:i]$) with $\Theta(n)$, suffices ($x[i:]$) with $\Theta(n)$, and substrings ($x[i:j]$) with $\Theta(n^2)$

A general method to memoize is the following

```
def memoize(solver)
memo = dict()
def caller(problem):
if problem in memo:
return memo
memo[problem] = solver(problem)
return memo[problem]
return caller
@memoize()
def solver(problem):
# base case
# relate
# topological order
solver(original_problem)
```

Of course, we optimize changing recursion into an iterative version. Suppose we have definite the SRTBOT of a recursion algorithm, so you can transform it.

```
memo = [] * n
# start in base case
for i=topological order
# instead using recursive calls,
# you call the memo and you build it bottom-up
relate memo
return memo[0] # the original problem
```

Recursive algorithm design paradigms.

We think subproblems as a graph and edges are dependencies between each other.

Roughly speaking, you can solve problems forward (or top-down) and backward (bottom-up).

Recursive Framework. SRT BOT.

**Subproblem definition.**

Definite Input and Output type.

suffix,

max, min

**Relate recursively.**

Relate subproblem solutions recursively $x(i)=f(x(j),...),j<i$ —mathematically.

Reduce subproblems into subproblems and brute force.

You have to find a recurrence relation.

**Topological order.**

Argue relation is acyclic

You must guarantee no cycles. Indeed, it must be a DAG. Because we don’t want infinite recurse.

a→b means b needs a.

Nodes are subproblems, and an edge is if a subproblem needs another one.

**Base cases.**

State solutions for reachable independent solutions.

**Original problem.**

Show how to compute a solution from the original problem.

**Time analysis.**

Normally $T(n)= \text{\# subproblems} \times \text{nonrecursive work in relation}$

If subproblems are for multiple inputs, you take the cross product and multiply the problem spaces.

### Worked examples

**Arithmetic Parenthesization**

**Drunken Donuts**

Drunken Donuts, a new wine-and-donuts restaurant chain, wants to build restaurants on many

street corners to maximize their total profit.

The street network is described as an undirected graph G = (V, E), where the potential restaurant

sites are the vertices of the graph. Each vertex u has a nonnegative integer value pu, which describes the potential profit of site u. Two restaurants cannot be built on adjacent vertices (to avoid self-competition). You are supposed to design an algorithm that outputs the chosen set U ⊆ V of sites that maximize the total profit u∈U pu. First, for parts (a)–(c), suppose that the street network G is acyclic, i.e., a tree.

```
graph = [
[9, [1],[],0,None,None],
[1, [2], [],1,None,None],
[10, [3], [],2,None,None],
[5, [], [],3,None,None]
]
import numpy as np
import functools
def A(v):
if len(v[1]) == 0:
return v[0]
return v[0]+sum([B(graph[child]) for child in v[1]])
def B(v):
if len(v[1]) == 0:
return 0
return sum([max(A(graph[child]),B(graph[child])) for child in v[1]])
def recursive_dp(v):
return [A(v),B(v)]
def iterative_dp(vertices):
for vertex in vertices:
if len(vertex[1]) == 0:
vertex[4] = vertex[0]
vertex[5] = 0
else:
child_sum = sum([max(graph[child][4],graph[child][5]) for child in vertex[1]])
grandchild_sum = sum([max(graph[grandchild][4],graph[grandchild][5]) for child in vertex[1] for grandchild in graph[child][1]])
vertex[4] = vertex[0] + grandchild_sum
vertex[5] = child_sum
return max(vertices[len(vertices)-1][4], vertices[len(vertices)-1][5])
def topological_sort(v):
stack = [v]
order = 0
number_of_items = len(stack)
while order < number_of_items:
element = stack[order]
order += 1
stack = stack + [graph[child] for child in element[1]]
number_of_items = len(stack)
return list(reversed(stack))
iterative_dp(topological_sort(graph[0]))
```

**Explanation.**

**Correctness.**

**Efficiency.**

## Branch-and-bound

## Greedy algorithms

## Heuristics

### Approximation Algorithms

Optimization algorithms.

Vazirani, V.V. (2004). Approximation Algorithms. Springer, 2004.

## Reduction: transform-and-conquer

Bitwise operations

# Information Structures

6.851

## 2.1 Timeline

Objects.

Functions.

Array.

Graph.

Tree.

...

**Combinatorial Algorithms**

## Elementary Graph Algorithms

### Breadth-first search

- Class: Search algorithm

- $T(n)=O(V+E)$

- $S(n)=O(V)$

### DFS

### Full-BFS

### Full-DFS

### Topological Sort

**DFS**

**Kahn's algorithm**

## Shortest path problem

https://web.mit.edu/6.454/www/www_fall_2003/gew/CEtutorial.pdf

**Worked examples**

https://leetcode.com/problems/alien-dictionary/

Given a sorted dictionary of an alien language having N words and k starting alphabets of a standard dictionary. Find the order of characters in the alien language.

**Note:**

Many orders may be possible for a particular test case, thus you may return any valid order and the output will be 1 if the order of string returned by the function is correct else 0 denoting incorrect string returned.

### Summary

Graph reachability by DFS or DFS in $O(n)$.

You can check and apply in IA.

## MST

## Network Flow

We begin

## Merkle Dag

## Clique

Finding the maximum k-distance clique in tree-like graphs is a well-known problem in graph theory and computer science. A k-distance clique is a subset of nodes in which each node is at most k edges away from every other node. The problem of finding the maximum k-distance clique is to find the largest such subset.

**Cycle detection**

Lowest Common Ancestor

BFS and Tarjan component

Floyd's tortoise and hare

```
function find_mu_in_cycle(f,x0) {
let hare = x0;
let tortoise = x0;
while( (tortoise = f(tortoise)) != (hare = f(f(hare))) && hare !== undefined);
tortoise = x0;
while((tortoise = f(tortoise)) != (hare = f(hare)));
return tortoise;
}
let array = [5,4,3,2,4,0];
find_mu_in_cycle(x => array[x], 0);
```

**Detecting errors in product code replacements **by Rodrigo Gálvez (a telegram user)

Product codes are replaced by other products. This information is stored in the database relating the original product code to the new product. The problem is that sometimes the original product replaces the new product. Someone entered that product A is replaced by product B, and then that product B is replaced by product A. This replacement chain can be long and difficult to detect.

To solve this problem, I used Floyd's hare and tortoise algorithm. It consists of looping through the replacement string using two cursors. Following a specific logic, if the cursors are found, then a loop exists and the replacement chain is reported as wrong.

## Graph and Matrix

6.890

8.889

## String Matching

Levenshtein distance

Regex+Levenshtein

# Programming Techniques

Memory hierarchy

## Randomized Algorithms

6.842

Randomized algorithms change the definition of correct and efficient.

Motwani, R., Raghavan, P. (1995). Randomized Algorithms. Cambridge University Press.

Las Vegas. Always efficient, probably correct.

Monte Carlo. Always efficient, probably correct.

## Parallel Algorithms.

Shared memory.

6.816, 6.846

## Distributed algorithms.

Message passing.

6.852

https://jukkasuomela.fi/da2020/

## Online Algorithms

https://www.youtube.com/playlist?list=PLjbPFHE4z0zcsmzMPOg-J16SIDS3t0ylw

Instances of online algorithms are SAX Parser (**Simple API for XML), **

## Tips on Constructive Algorithms

## Algorithms for external sources

**Persistent Data Structures**

## Constant Factors - Performance

6.172

# Next steps

Systems.

Theory of computation. 6.045 6.840 6.841

## Applications

TODO.

Biology (6.047)

Game Theory (6.853)

Cryptography (6.875)

Vision (6.873)

Graphics (6.837) 6.838

Geometry (6.850)

Folding (6.849)

Recreational algorithms. Algorithmic Lower bounds: Fun with hardness proofs (6.892)

3D Impression?

# References

- Data Structure Visualization. (2022, December 12). Retrieved from https://www.cs.usfca.edu/~galles/visualization/Algorithms.html

- MIT OpenCourseWare. (2022, December 02). MIT OpenCourseWare. Retrieved from https://ocw.mit.edu/courses/6-046j-design-and-analysis-of-algorithms-spring-2015/pages/lecture-notes

- Fethi Rabhi and Guy Lapalme. 1999. Algorithms; A Functional Programming Approach (1st. ed.). Addison-Wesley Longman Publishing Co., Inc., USA.

- Stone, J. D. . Algorithms for Functional Programming. Springer. Retrieved from https://link.springer.com/book/10.1007/978-3-662-57970-1

- Bird, R., & Gibbons, J. (2020).
*Algorithm Design with Haskell*. Cambridge University Press.

- The Art of Computer Programming by Donald Knuth.

- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein. Introduction to Algorithms. The MIT Press, 4ª Ed., 2022. ISBN: 9780262046305.

- Interview to Ph.D. Cormen.

- [Handbook] The cracking the coding interview

- Horowitz, E., Sahni, S., & Rajasekaran, S. (1996).
*Computer Algorithms: Pseudocode Version*(2nd ed.). W.H. Freeman.

- [Advance] Knuth, D. E. The art of computer programming. Addison Wesley.

- A. V. Aho, J. E. Hopcroft, J. D. Ullman,
*The Design and Analysis of Computer Algorithms.*Addison-Wesley, 1974. ISBN 0-201-00023-7

- Dasgupta, Sanjoy; Papadimitriou, Christos; Vazirani, Umesh (2008).
*Algorithms*. McGraw-Hill. p. 317. ISBN 978-0-07-352340-8.

- Kleinberg, Tardos: Algorithm Design

- MIT OpenCourseWare. (2023, January 12). MIT OpenCourseWare. Retrieved from https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020

- The Algorithm Design Manual by Steven Skiena

https://web.stanford.edu/class/archive/cs/cs166/cs166.1216/

https://web.stanford.edu/class/cs161/

Bhargava, A. (2016). Grokking Algorithms: An Illustrated Guide for Programmers and Other Curious People. Manning.

La Rocca, M. (2021). Advanced Algorithms and Data Structures. Manning.

Wengrow, J. (2020). A Common-Sense Guide to Data Structures and Algorithms. Pragmatic Bookshelf

Agarwal, B. (2018). Hands-On Data Structures and Algorithms with Python: Write complex and powerful code using the latest features of Python 3.7. Packt Publishing.

Garey, M.R., Johnson, D.S., Freeman, W.H. (1979). Computers and Intractability: A Guide to the Theory of NP Completeness