Theory of Computation

TagsComputer scienceMathScience



Machines, Languages, and Computation at MITPeter J. DenningTheoretical Models of Computationanecdotes

Strings and Languages


Alphabet. Any non-empty finite set. We generally use either capital Greek letters Σ,Γ\Sigma ,\Gamma , or uppercase letters A,B,...A,B,... to designate alphabets.

Symbol. The members of an Alphabet.

Examples: Σ1={0,1};Σ2={a,b,c,d,e,f,g,h};Γ={0,1,x,y,z}\Sigma_1=\{0,1\};\Sigma_2=\{a,b,c,d,e,f,g,h\};\Gamma=\{0,1,x,y,z\}

String over an alphabet. A finite sequence of symbols from that alphabet.

Length of a string. If ww is a string over Σ\Sigma, the length of ww, written w|w|, is the number of symbols that it contains. Some authors prefer to use na(w)n_a(w) for indicating some aa characteristic applied on ww.

Empty string. w=ϵ=λ=0|w|=\epsilon=\lambda=0

Language. Set of strings. We generally use upper-case letters A,B,...A,B,... to designate languages.

Lexicographic ordering.

1. Shorter strings precede longer strings.

2. The first symbol in the alphabet precedes the next symbol.

Regular operations.

Let A,BA,B be languages:

Union: AB={wwA or wB}A\cup B=\{w|w\in A\text{ or }w\in B\}

Concatenation: AB={xyxA or yB}=ABA\cdot B=\{xy|x\in A\text{ or }y\in B\}=AB

Ak=AA...Ak times,A1=A,A0={ϵ}A^k=A \overset{\text{k times}}{ A...A},A^1=A,A^0=\{\epsilon\}

Kleene star: A={x1...xk each xiA for k0}=A0A1A2A3...A^*=\{x_1...x_k|\text{ each }x_i\in A \text{ for } k\ge 0\}=A^0\cup A^1\cup A^2 \cup A^3 \cup..., note ϵA\epsilon\in A^* always.

Kleene plus: A+=A1A2A3...=VVA^+=A^1\cup A^2\cup A^3\cup...=V^*V

Example. Let A={good,bad}A=\{good,bad\} and B={boy,girl}B=\{boy,girl\}

AB={good,bad,boy,girl}A\cup B=\{good,bad,boy,girl\}




Worked examples


Models of computation




Formal grammar is a tool for syntax, not semantics.


Grammar. A grammar is a 4-tuple (V,T,S,P),(V,T,S,P) ,

where VV is a finite set of objects called variables,

TT is a finite set of objects, disjoint from VV, called terminal symbols, in short terminals,

SVS\in V is a special symbol called the start variable,

PP is a finite set of rules, called production rules, with each rule being a variable a string of variables and terminals.

BNF. A particular form of notation for grammar (Backus-Naur form).

CNF. A particular form of notation for grammar (Chomsky normal form).

Language generated by GG. Let G=(V,T,S,P),G=(V,T,S,P) , be a grammar. Then the set

L(G)={wT:Sw}L(G)=\{w\in T^*:S\to^* w\}

is the language generated by G.

Production rules. If production rules are of the form

xyx\to y

where x(VT)+x \in (V \cup T)^+, y(VT)y \in (V \cup T)^*. The production rules are applied in the following manner: Given a string w1=uxvw_1=uxv

xyx \to yw1=uxvw_1=u\bold{x}vw2=uyvw_2=u\bold{y}v

We say that w1w_1 derives w2w_2 o that w2w_2 is derived from w1w_1.

A rule can be used whenever it’s applicable.

If w1w2...wn,w_1\to w_2\to ...\to w_n,

we say that w1w_1 derives wnw_n and write

w1wnw_1 \to^* w_n . The * indicates an unspecified number of steps (including zero).

Chomsky normal form (CNF) example

V={P,S,M}V=\{P,S,M\}, T={+,,1,2,3,4}T=\{+,*,1,2,3,4\}, SVS\in V

PSSS+MMMMTTT1234P \to S \\ S \to S+M|M\\ M \to M*T|T\\ T \to 1|2|3|4

Backus–Naur form (BNF) example

V={P,S,M}V=\{P,S,M\}, T={+,,1,2,3,4}T=\{+,*,1,2,3,4\}, SVS\in V

<P>::=<S><S>::=<S>+<M><M><M>::=<M><T><T><T>::=1234<P> ::= <S> \\ <S> ::= <S> + <M> | <M>\\ <M> ::= <M> * <T> | <T>\\ <T> ::= 1 | 2 | 3 | 4

Grammar Hierarchy

Set view


Table view

Grammar typeGrammar acceptedLanguage AcceptedAutomatonProduction rule form
Type 0. These are the most general.Free or unrestricted grammarsRecursively enumerable languageTuring Machineuvu\to v, where both u and v are arbitrary strings of symbols in V, with u non-null.
Type 1Context-sensitive grammarsContext-sensitive languageLinear-bounded automatonuXwuvw,uXw\to uvw, where uu,vv, and ww are arbitrary strings of symbols in VV, with vv non-null, and XX a single nonterminal. In other words, in a particular context.
Type 2Context-free grammarsContext-free languagePushdown automatonXvX\to v, where vv is an arbitrary string of symbols in VV, and XX is a single nonterminal. i.e. regardless of context.
Type 3. These grammars are the most limited in terms of expressive power.Regular grammarsRegular languageFinite-state automatonXaX \to a, XaYX \to aY (right-regular grammar), XYaX \to Ya (left-regular grammar), XϵX \to \epsilon, where XX and YY are nonterminals and aa is a terminal. Regular grammar is either right-linear grammar or left-linear grammar, but not both. Mixing left-regular and right-regular rules are the context-free grammar.




Worked examples

Steps to convert regular expressions directly to regular grammars and vice versa



Are axioms in math equivalent to production rules in unrestricted grammars?

how could a CFG hold an unrestricted grammar (UG), which is turing complete?


Notes and references

[1] M. Johnson, “Formal Grammars Handout written,” 2012 [Online]. Available: https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/handouts/080 Formal Grammars.pdf. [Accessed: 28-Feb-2022]

Finite automata

Finite automata and their probabilistic counterpart Markov chains. Finite Automaton (FA) or Finite-State Machine. An automaton (Automata in plural).

State diagram of a finite automaton M1. It has two states.
OOP uses a State Pattern, which is a design pattern.

We feed the input string 0101 to the machine M1M_1, the processing proceeds as follows:

  1. Start in state q1q_1.
  1. Read 1, follow the transition from q1q_1 to q2q_2.
  1. Read 1, follow the transition from q2q_2 to q1q_1.
  1. Accept because M1M_1 is in an accept state q2q_2 at the end of the input.

The formal definition of a deterministic finite automaton (DFA)

A finite automaton is a 5-tuple (Q,Σ,δ,q0,F)(Q,\Sigma,\delta,q_0,F) where

  1. QQ is a finite set called the states,
  1. Σ\Sigma is a finite set called the alphabet,
  1. δ:Q×ΣQ\delta:Q\times \Sigma\rarr Q is the transition function,
  1. q0Qq_0 \in Q is the start state, and
  1. FQF\subseteq Q is the set accept states, they are sometimes called final states.

The formal definition of a nondeterministic finite automaton (NFA)

A finite automaton is a 5-tuple (Q,Σ,δ,q0,F)(Q,\Sigma,\delta,q_0,F) where

  1. QQ is a finite set called the states,
  1. Σ\Sigma is a finite set called the alphabet,
  1. δ:Q×(Σ{λ})P(Q)={RRQ}\delta:Q\times (\Sigma\cup\{\lambda\})\rarr P(Q)=\{R|R \subseteq Q\} is the transition function (P is a power set),
  1. q0Qq_0 \in Q is the start state, and
  1. FQF\subseteq Q is the set accept states, they are sometimes called final states.

Converting NFAs to DFAs

delta = {
  'state1': {
     'symbol1': {'state1', 'state2',...}


What is the difference between finite automata and finite state machines?

Pattern Matching

Extended finite-state machine


Automata and Grammar

Regular expressions to Automata

  1. Generalized nondeterministic finite automaton.
def convert_to_regular_rxpression(G):
   if G.number_of_states == 2:
  1. Solving right-linear equation by Arden’s rule.