Parser LR(0) SLR(1) LALR(1) LR(1)

modifica
Figure 1. Architecture of a table-based bottom-up parser.

A table-based bottom-up parser can be schematically presented as in Figure 1.

The parser has an input buffer, a stack on which it keeps a list of states it has been in, an action table and a goto table that tell it to what new state it should move or which grammar rule it should use given the state it is currently in and the terminal or nonterminal it has just read on the input stream. To explain its workings we will use the following small grammar:

(1) E → E * B
(2) E → E + B
(3) E → B
(4) B → 0
(5) B → 1

The Action and Goto Table

modifica

The two LR(0) parsing tables for this grammar look as follows:

action goto
state * + 0 1 $   E B
0     s1 s2     3 4
1 r4 r4 r4 r4 r4    
2 r5 r5 r5 r5 r5    
3 s5 s6   acc      
4 r3 r3 r3 r3 r3      
5     s1 s2     7
6     s1 s2     8
7 r1 r1 r1 r1 r1      
8 r2 r2 r2 r2 r2      

The action table is indexed by a state of the parser and a terminal (including a special terminal $ that indicates the end of the input stream) and contains three types of actions:

  • shift, which is written as 'sn' and indicates that the next state is n
  • reduce, which is written as 'rm' and indicates that a reduction with grammar rule m should be performed
  • accept, which is written as 'acc' and indicates that the parser accepts the string in the input stream.

The goto table is indexed by a state of the parser and a nonterminal and simply indicates what the next state of the parser will be if it has recognized a certain nonterminal.

The Parsing Algorithm

modifica

The LR parsing algorithm now works as follows:

  1. The stack is initialized with [0]. The current state will always be the state that is on top of the stack.
  2. Given the current state and the current terminal on the input stream an action is looked up in the action table. There are four cases:
    • a shift sn:
      • the current terminal is removed from the input stream
      • the state n is pushed onto the stack and becomes the current state,
    • a reduce rm:
      • the number m is written to the output stream,
      • for every symbol in the right-hand side of rule m a state is removed from the stack and
      • given the state that is then on top of the stack and the left-hand side of rule m a new state is looked up in the goto table and made the new current state by pushing it onto the stack.
    • an accept: the string is accepted
    • no action: a syntax error is reported
  3. The previous step is repeated until the string is accepted or a syntax error is reported.


To explain why this algorithm works we now proceed with showing how a string like "1 + 1" would be parsed by such a parser. The table below illustrates each step in the process, here the state refers to the element at the top of the stack (the right-most element), and the next action is determined by referring to the action table above. Also note that a $ is appended to the input string to denote the end of it.


State Input Stream Output Stream Stack Next Action
0 1+1$ [0] Shift 2
2 +1$ [0,2] Reduce 5
4 +1$ 5 [0,4] Reduce 3
3 +1$ 5,3 [0,3] Shift 6
6 1$ 5,3 [0,3,6] Shift 2
2 $ 5,3 [0,3,6,2] Reduce 5
8 $ 5,3,5 [0,3,6,8] Reduce 2
3 $ 5,3,5,2 [0 3] Accept


When the parser starts it always starts with the initial state 0 and the following stack:

[0]

The first terminal that the parser sees is the '1' and according to the action table it should then go to state 2 resulting in the following stack:

[0 '1' 2]

The top of the stack is shown on the right-hand side. (For the sake of our explanation we also show the symbol (e.g., '1', B) that caused the transition to the next state although strictly speaking it is not part of the stack.)

In state 2 the action table says that whatever terminal we see on the input stream we should do a reduction with grammar rule 5. If the table is correct then this means that the parser has just recognized the right-hand side of rule 5, which is indeed the case. So in this case we write 5 to the output stream, pop one state from the stack, and push on the stack the state from the cell in the goto table for state 0 and B, i.e., state 4, onto the stack. The resulting stack is:

[0 B 4]

However, in state 4 the action table says we should now do a reduction with rule 3. So we write 3 to the output stream, pop one state from the stack, and find the new state in the goto table for state 0 and E, which is state 3. The resulting stack:

[0 E 3]

The next terminal that the parser sees is a '+' and according to the action table it should then go to state 6:

[0 E 3 '+' 6]

Note that the resulting stack can be interpreted as the history of a finite state automaton that has just read a nonterminal E followed by a terminal '+'. The transition table of this automaton is defined by the shift actions in the action table and the goto actions in the goto table.

The next terminal is now '1' and this means that we perform a shift and go to state 2:

[0 E 3 '+' 6 '1' 2]

Just as the previous '1' this one is reduced to B giving the following stack:

[0 E 3 '+' 6 B 8]

Again note that the stack corresponds with a list of states of a finite automaton that has read a nonterminal E, followed by a '+' and then a nonterminal B. In state 8 we always perform a reduce with rule 2. Note that the top 3 states on the stack correspond with the 3 symbols in the right-hand side of rule 2.

[0 E 3]

Finally, we read a '$' from the input stream which means that according to the action table (the current state is 3) the parser accepts the input string. The rule numbers that will then have been written to the output stream will be [5, 3, 5, 2] which is indeed a rightmost derivation of the string "1 + 1" in reverse.

SLR(1) sta per Simple LR parser che necessita di un simbolo di look-ahead. Le tabelle di parsing sono generate in maniera equivalente al parser LR(0) con l'eccezione che viene fatta una riduzione per la regola grammaticale A→a solo se il simbolo seguente, nella frase in ingresso fa parte del FOLLOW(A). Questo permette di risolvere dei conflitti spostamento/riduzione che capitano nei parser LR(0).

In computer science, a Simple LR parser or SLR parser is an LR parser for which the parsing tables are generated as for an LR(0) parser except that it only performs a reduction with a grammar rule Aw if the next symbol on the input stream is in the follow set of A (see LL parser for a definition of follow set). Such a parser can prevent certain shift-reduce and reduce-reduce conflicts (see LR parser) that occur in LR(0) parsers and it can therefore deal with more grammars. However, it still cannot parse all grammars that can be parsed by an LR(1) parser. A grammar that can be parsed by an SLR parser is called a SLR grammar.

Example

modifica

A grammar that can be parsed by an SLR parser but not by an LR(0) parser is the following:

(1) E → 1 E
(2) E → 1

Constructing the action and goto table as is done for LR(0) parsers would give the following item sets and tables:

Item set 0
S → • E
+ E → • 1 E
+ E → • 1
Item set 1
E → 1 • E
E → 1 •
+ E → • 1 E
+ E → • 1
Item set 2
S → E •
Item set 3
E → 1 E •

The action and goto tables:

action goto
state 1 $ E
0 s1 2
1 s1/r2 r2 3
2 acc
3 r1 r1

As can be observed there is a shift-reduce conflict for state 1 and terminal '1'. However, the follow set of E is { $ } so the reduce actions r1 and r2 are only valid in the column for $. The result are the following conflict-less action and goto table:

action goto
state 1 $ E
0 s1 2
1 s1 r2 3
2 acc
3 r1

In computer science, Look-Ahead LR parsers or LALR parsers are a specialized form of LR parsers that can deal with more context-free grammars than Simple LR parsers but fewer than LR(1) parsers can. It is a very popular type of parser because it gives a good trade-off between the number of grammars it can deal with and the size of the parsing tables it requires. It is these types of parsers that are generated by compiler-compilers such as yacc and GNU bison.

Like SLR, LALR is a refinement to the technique for constructing LR(0) parse tables. While SLR uses follow sets to construct reduce actions, LALR uses lookahead sets, which are more specific because they take more of the parsing context into account. follow sets are associated with a symbol, while lookahead sets are specific to an LR(0) item and a parser state.

Specifically, the follow set for a given LR(0) item I in a given parser state S contains all symbols that are allowed by the grammar to appear after I's left-hand-side nonterminal. In contrast, the lookahead set for I contains only those symbols that are allowed by the grammar to appear after I's right-hand-side has been parsed starting from state S. follow(I) is effectively the union of the lookahead sets for all LR(0) items with the same left-hand-side as I, regardless of parser states or right-hand-sides, therefore losing all context information. Because the LOOKAHEAD set is specific to a particular parsing context, it can be more selective, therefore allowing finer distinctions than the FOLLOW set.

References

modifica
  • Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison--Wesley, 1986. (Describes the traditional techniques for building LALR(1) parsers.)
  • Frank DeRemer and Thomas Pennello. Efficient Computation of LALR(1) Look-Ahead Sets. ACM Transactions on Programming Languages and Systems (TOPLAS) 4:4, pp. 615–649. 1982. (Describes a more efficient technique for building LALR(1) parsers.)
  • Richard Bornat Understanding and Writing Compilers, Macmillan, 1979. (Describes the principles of automated left-to-right parsing and how to construct the parser tables, what a follow set is, etc., in English, not mathematics.)

See also

modifica

A canonical LR parser or LR(1) parser is an LR parser whose parsing tables are constructed in a similar way as with LR(0) parsers except that the items in the item sets also contain a follow, i.e., a terminal that is expected by the parser after the right-hand side of the rule. For example, such an item for a rule A → B C might be

A → B · C, a

which would mean that the parser has read a string corresponding to B and expects next a string corresponding to C followed by the terminal 'a'. LR(1) parsers can deal with a very large class of grammars but their parsing tables are often very big. This can often be solved by merging item sets if they are identical except for the follows, which results in so-called LALR parsers.

Constructing LR(1) parsing tables

modifica

An LR(1) item is a production with a marker together with a terminal, e.g., [S → a A · B e, c]. Intuitively, such an item indicates how much of a certain production we have seen already (a A), what we could expect next (B e), and a lookahead that agrees with what should follow in the input if we ever reduce by the production S → a A B e. By incorporating such lookahead information into the item concept, we can make more wise reduce decisions. The lookahead of an LR(1) item is used directly only when considering reduce actions (i.e., when the · marker is at the right end).

The core of an LR(1) item [S → a A · B e, c] is the LR(0) item S → a A · B e. Different LR(1) items may share the same core. For example, if we have two LR(1) items of the form

  • [A → α ·, a] and
  • [B → α ·, b],

we take advantage of the lookahead to decide which reduction to use. (The same setting would perhaps produce a reduce/reduce conflict in the SLR approach.)

Validity

modifica

The notion of validity changes. An item [A → β1 · β2, a] is valid for a viable prefix α β1 if there is a rightmost derivation that yields α A a w which in one step yields α β1β2 a w

Initial item

modifica

To get the parsing started, we begin with the initial item of

[S’ → · S, $].

Here $ is a special character denoting the end of the string.

Closure

modifica

Closure is more refined. If [A → α · B β, a] belongs to the set of items, and B → γ is a production of the grammar, then we add the item [B → · γ, b] for all b in FIRST(β a).

Every state is closed according to Closure.

Goto is the same. A state containing [A → α · X β, a] will move to a state containing [A → α X · β, a] with label X.

Every state has transitions according to Goto.

Shift actions

modifica

The shift actions are the same. If [A → α · b β, a] is in state Ik and Ik moves to state Im with label b, then we add the action

action[k, b] = "shift m"

Reduce actions

modifica

The reduce actions are more refined. If [A→α., a] is in state Ik, then we add the action: "Reduce A → α" to action[Ik, a]. Observe that we don’t use information from FOLLOW(A) anymore. The goto part of the table is as before.

S’ → S
S → CC
C → c C | d
FIRST
S c d
C c d
S’ → S
S → L = R | R
L → * R | id
R → L
FIRST
S * id
L * id
R * id


References

modifica
See Discussion: http://www.cse.uconn.edu/~akiayias/cse244fa04/N244-4-LR-more.ppt