Exam

What is parsing?

It is a compiler process that verifies if grammar can generate the string of tokens from the lexical analyzer and meanwhile it constructs a parse tree.

Describe regular language limitations and context-free grammar characteristics.

By the theory of computation results, we know that context-free grammars describe more languages than regular languages, informally we say regular languages “don’t cannot count” but CFG can instead. A prototypical language example is L={anbnn1}L=\{a^nb^n|n\ge 1\}, where CFG handles it, but regular languages don’t.

CFG consists of a set of non-terminals, a set of terminals, a start symbol, and a set of production rules in form XvX\to v, where vv is an arbitrary string of symbols in VV, and XX is a single nonterminal.

How do we know if a grammar is ambiguous? Provide an example.

Grammar is ambiguous when it produces more than one parse tree for some string.

Example.

Let {a}Σ\{a\}\in\Sigma, where the rest are non-terminal symbols, the grammar production rule are

SABAaBaS\to A|B \\ A \to a\\ B \to a

The grammar produces two distinct derivations for the string aa:

SAaSBaS\to A\to a \\ S\to B \to a

Therefore, it is ambiguous grammar.

Provide context-free grammar for

the language L(G)={w{a,b}the number of a is greater than the number of b}L(G)=\{w\in\{a,b\}^*|\text{the number of } a \text{ is greater than the number of }b\}

G:

SASCCSAaAaCaCbbCaϵS\to A|SC|CS\\ A\to aA|a\\ C\to aCb|bCa|\epsilon

where A,S, C are nonterminals, and the rest terminal symbols.

Explanation.

The AA production rule produces a language with aia^i, i1i\ge 1

AaAaA\to aA|a

The CC production rule produces a language where the number of aa is equal to bb.

CaCbbCaϵC\to aCb|bCa|\epsilon

The SS —start—production rule produces a language where we union the language produced by CC and AA.

the language where Σ={int,+,(,),[,]}\Sigma= \{\bold{int}, +, (, ), [, ] \} and that contains nested sums with different separators between them.

Correct cases:

[(int + int) + int]

int + [int + (int + int)]

* [( int + int ) + (int + int)] We don’t handle this, because it doesn't handle the problem definition.

Bad cases:

int + int + int

(int + (int + int))

Basis: (int+int), [int+int], int+int, int+

Eliminate left recursion and apply the left factoring method to the following grammar

PxPPPP(P)(P)P;P;P\to x| P ⇒ P | P ⇐ P | (P) ⇔ (P) | P; P;

Procedure.

We apply the left factoring method to the following grammar.

PxPP(P)(P)PPP;P;P\to x| PP' |(P) ⇔ (P)\\ P' \to ⇒ P|⇐ P|; P; \\

Then, we eliminate the left recursion and we got the final grammar.

PxP(P)(P)PPPP;P;PPPϵP\to xP'' |(P) ⇔ (P)P''\\ P' \to ⇒ P|⇐ P|; P; \\ P''\to P'P''|\epsilon

Describe Top-Down and Bottom-Up parsers.

The compiler community classifies parsing strategies by how they build the parse tree.

The top-down parsers build the parse tree from the start symbol (root) to the tokens (leaves) in preorder.

The bottom-up parsers build the parse tree from the tokens (leaves) to the start symbol (root). They are more general, just as efficient, and more preferred in practice than top-down parsers.

In both cases, the parser scans the symbol stream left-to-right, one by one. Therefore, we see terminals in order of appearance.

How many parse trees are for the string “a and a or a”?

Let the following a context-free grammar

SSandSSorSTaTaS → S and S | S or S | T | a\\ T\to a

where SS and TT are non-terminals, S is the start symbol, and andand, oror, aa are terminals. Ignore blanks.

How many parse trees are for the string a and a or aa\text{ }and\text{ }a\text{ }or\text{ }a?

The string generates 6 parse trees.

Proof.

We’re going to count how many derivations are possible from the start symbol

SSandSSandSorSS\to SandS\\ \to SandSorS

Here, we have two options STaS\to T\to a or SaS\to a for each SS, so the string generates 32=63*2=6  parse tree.

Build a LR(1) parser

Let a context free grammar with {a,b,c}Σ\{a, b, c\}\in \Sigma where its production rules are

SZZaMabMbaRbbRaMcRcS \to Z\\ Z \to aMa | bMb | aRb | bRa \\ M \to c \\ R \to c

Build a DFA parsing.

State 0. (I0I_0)

We begin computing the closure of {[S→.Z,$]}. We match every item with [Aα.Bβ,a][A \to \alpha.B\beta ,a].

ItemAα\alphaBβ\betaaComment
[S→.Z, $]Sϵ\epsilonZϵ\epsilon$Function CLOSURE tells us to add [Bγ,b]=[Z...,b][B \to \gamma , b]=[Z\to..., b],b=FIRST(βa)=FIRST(ϵ$)=FIRST($)=$ b=FIRST(\beta a)=FIRST(\epsilon \$)=FIRST(\$)=\$
[Z→.aMa, $]Zϵ\epsilonaMa$None of the new items has a nonterminal immediately to the right of the dot. So we have completed the state 0 or set of items 0.
[Z→.bMb, $]Zϵ\epsilonbMb$
[Z→.aRb, $]Zϵ\epsilonaRb$
[Z→.bRa, $]Zϵ\epsilonbRa$

Now we compute GOTO(I0I_0, X) where X={Z,a,M,b,R,c} (all grammar symbols with no augmented symbol)

On X=Z go to state 1.

On X=a go to state 2.

On X=b go to state 3.

State 1. For X=Z

Since no additional closure is possible, the set of items is

[SZ.,$][S→Z., \$]

It is an accepting state.

State 2. For X=a

GOTO(I0I_0, X=a)=kernelclosure(kernel)kernel\cup closure(kernel)={[Z -> a.Ma, $], [Z -> a.Rb, $], [M -> .c, a], [R -> .c, b]}

On X=M go to state 4.

On X=R go to state 5.

On X=c go to state 6.

State 3. For X=b

GOTO(I0I_0, X=b)={[Z→b.Mb, $], [Z→b.Ra, $], [R→.c,a], [M→.c,b]}

On X=M go to state 7.

On X=R go to state 8.

On X=c go to state 9.

The fake States. For X=M,R,c

They don’t generate new states because they were included before. We don’t mention these fake states again.

Now we are concern about new items generated by state 2 I2I_2={[Z -> a.M a, $], [Z -> a.Rb, $], [M -> .c, a], [R -> .c, b]}, so we call GOTO(I2I_2, X) where X={Z,a,M,b,R,c}

States 4. For X=M

GOTO(I2I_2, X=M)={[Z→aM.a,$]}

On X=a go to state 10.

States 5. For X=R

GOTO(I2I_2, X=R)={[Z→aR.b,$]}

On X=b go to state 11.

States 6. For X=c

GOTO(I2I_2, X=c)={[M -> c., a], [R -> c., b]}

It doesn’t go anywhere.

Now we are concern about new items generated by state 3 I4I_4={[Z→b.Mb, $], [Z→b.Ra, $], [R→.c,a], [M→.c,b]}, so we call GOTO(I4I_4, X) where X={Z,a,M,b,R,c}

States 7. For X=M

GOTO(I4I_4, X=M)={[Z→bM.b,$]}

On X=b go to state 12.

States 8. For X=R

GOTO(I4I_4, X=R)={[Z→bR.a,$]}

On X=a go to state 13.

States 9. For X=c

GOTO(I4I_4, X=c)={[R -> c., a], [M -> c., b]}

It doesn’t go anywhere.

Now we are concern about new items generated by state 4 I5I_5={[Z→aM.a,$]}, so we call GOTO(I3I_3, X) where X={Z,a,M,b,R,c}

States 10. For X=a

GOTO(I5I_5, X=a)={[Z→aMb.,$]}

It doesn’t go anywhere.

Now we are concern about new items generated by state 5 I6I_6={[Z→aR.b,$]}, so we call GOTO(I6I_6, X) where X={Z,a,M,b,R,c}

States 11. For X=b

GOTO(I6I_6, X=b)={[Z→aRb.,$] }

It doesn’t go anywhere.

Now we are concern about new items generated by state 7 I7I_7={[Z→bM.b,$]}, so we call GOTO(I7I_7, X) where X={Z,a,M,b,R,c}

States 12. For X=b

GOTO(I7I_7, X=b)={[Z→bMb.,$]}

It doesn’t go anywhere.

Now we are concern about new items generated by state 8 I8I_8={[Z→bR.a,$]}, so we call GOTO(I8I_8, X) where X={Z,a,M,b,R,c}

States 13. For X=a

GOTO(I8I_8, X=a)={[Z→bRa.,$]}

It doesn’t go anywhere.

Build the parsing LR(1) table.

The codes for the actions are:

  1. si means shift and stack state i,
  1. rj means reduce by the production numbered j,
  1. blank means error.
(1) SZ(2) ZaMa(3) ZbMb(4) ZaRb(5) ZbRa(6) Mc(7) Rc(1)\text{ }S \to Z\\ (2)\text{ }Z \to aMa \\ (3)\text{ }Z \to bMb \\ (4)\text{ }Z \to aRb \\ (5)\text{ }Z \to bRa \\ (6)\text{ }M \to c \\ (7)\text{ }R \to c
STATEACTIONGOTO
abc$MRZ
0s2s31
1accept
2s645
3s978
4s10
5s11
6r6r7
7s12
8s13
9r7r6
10r2
11r4
12r3
13r5

Show how it works with the string acbacb.

StackSymbolsInputAction
0acb$shift
0 2acb$shift
0 2 6acb$reduce by R→c
0 2 5aRb$shift
0 2 5 11aRb$reduce by Z→aRb
0 1Z$accept