Last Update Time-stamp: "97/06/30 13:50:16 umrigar"
This is a brief intuitive introduction to shift-reduce bottom-up parsing. Any compiler text should provide more details. A elementary introduction to grammars and language analysis is also available.
A shift-reduce parser uses a parse stack
which (conceptually) contains grammar symbols. During the operation of the
parser, symbols from the input are shifted onto the stack. If a
prefix of the symbols on top of the stack matches the RHS of a grammar rule
which is the correct rule to use within the current
context, then the parser reduces the RHS of the rule to
its LHS, replacing the RHS symbols on top of the stack with the nonterminal
occurring on the LHS of the rule. This shift-reduce process continues until
the parser terminates, reporting either success or failure. It terminates
with success when the input is legal and is accepted by the parser.
It terminates with failure if an error is detected in the input.
The parser is nothing but a stack automaton which may be in one of
several discrete states. A state is usually represented simply as
an integer. In reality, the parse stack contains states, rather than
grammar symbols. However, since each state corresponds to a unique grammar
symbol, the state stack can be mapped onto the grammar symbol stack
mentioned earlier.
The operation of the parser is controlled by a couple of tables:
For example, consider the following simple grammar
One possible set of shift-reduce parsing tables is shown below
( A trace of the parser on the input The general idea of bottom-up parsing is to repeatedly match the RHS of
some rule and reduce it to the rule's LHS. To identify the matching RHS's,
the parser needs to keep track of all possible rules which may match. This
is done by means of the parser state, where each state keeps track
of the set of rules the parser may currently be in, and how far along the
parser may be within each rule. This idea of states will become clearer if
we attempt to build the tables for a small example.
Consider the grammar used in the previous section, repeated below for
convenience:
Note that the construction guarantees that each state is entered by a
unique grammar symbol; that is why we can map a state stack into a symbol
stack as mentioned earlier.
For our example grammar, we get the following LR(0) machine:
We should reduce by a rule when that rule has been completely
recognized; i.e., when the state contains a reducing item with the
dot at the extreme right. However, if a state has reducing items, under
what conditions should the corresponding reductions be applied?
It is worth emphasizing that the only essential difference between
SLR(1), LALR(1) and LR(1) parsers is how the decision is made as to when a
reduction should be applied. If multiple actions are possible for a
particular state and lookahead, then that state has conflicts. The
parsers are listed above in order of increasing precision of the
information used for the reduce decision; hence it is possible that a SLR(1)
parser has conflicts when a LALR(1) parser has none, and a LALR(1) parser
has conflicts when a LR(1) parser has none.
Parse tables can often be represented more compactly by designating a
particular reduction to be the default reduction for a state. This
means that if no other action is possible in a state, the default reduction
for that state is applied. This explains why in the example, many rows of
the table have reduce entries in almost all their columns.
Default reductions do not affect the correctness of the parse. Their
only effect is on error detection: an error may not be detected until after
several (useless) reductions have been applied. However, the error is still
detected before any additional symbols are shifted onto the parse stack.
Copyright (C) 1997 Zerksis D. Umrigar
Feedback: Please email any feedback to zdu@acm.org.
The current state of a shift-reduce parser is the state on top of the state
stack. The detailed operation of such a parser is as follows:
[
s][
t]
,
which can contain four different kinds of entries:
[
s][
N]
.
[
s][
t]
:
[
s'][
N]
onto the stack. The lookahead token is not changed by this step.
0) $S: stmt <EOF>
1) stmt: ID ':=' expr
2) expr: expr '+' ID
3) expr: expr '-' ID
4) expr: ID
which describes assignment statements like a:= b + c - d
. (Rule
0 is a special augmenting production added to the grammar).
s
n denotes shift n,
r
n denotes reduce n,
acc denotes accept and blank entries denote error
entries):
Action Table
Goto Table
ID
':='
'+'
'-'
<EOF>
stmt
expr
0
s1
g2
1
s3
2
s4
3
s5
g6
4
acc
acc
acc
acc
acc
5
r4
r4
r4
r4
r4
6
r1
r1
s7
s8
r1
7
s9
8
s10
9
r2
r2
r2
r2
r2
10
r3
r3
r3
r3
r3
a:= b + c - d
is shown
below:
Stack
Remaining Input
Action
0/$S
a:= b + c - d
s1
0/$S 1/a
:= b + c - d
s3
0/$S 1/a 3/:=
b + c - d
s5
0/$S 1/a 3/:= 5/b
+ c - d
r4
0/$S 1/a 3/:=
+ c - d
g6
on expr
0/$S 1/a 3/:= 6/expr
+ c - d
s7
0/$S 1/a 3/:= 6/expr 7/+
c - d
s9
0/$S 1/a 3/:= 6/expr 7/+ 9/c
- d
r2
0/$S 1/a 3/:=
- d
g6
on expr
0/$S 1/a 3/:= 6/expr
- d
s8
0/$S 1/a 3/:= 6/expr 8/-
d
s10
0/$S 1/a 3/:= 6/expr 8/- 10/d
<EOF>
r3
0/$S 1/a 3/:=
<EOF>
g6
on expr
0/$S 1/a 3/:= 6/expr
<EOF>
r1
0/$S
<EOF>
g2
on stmt
0/$S 2/stmt
<EOF>
s4
0/$S 2/stmt 4/<EOF>
accept
Construction of Shift-Reduce Parsing Tables
0) $S: stmt <EOF>
1) stmt: ID ':=' expr
2) expr: expr '+' ID
3) expr: expr '-' ID
4) expr: ID
The input must be ultimately reducible to the augmenting nonterminal
$S
. Hence the parser should initially be in rule 0; more
specifically, it should be expecting the stmt
in rule 0. To
show precisely which symbol is expected in a rule RHS, we define an
item to be a rule, along with a position on the RHS
specifying the next symbol to be expected in that RHS. We denote an item as
a rule with a dot .
just before the next expected symbol.
Hence, returning to our example, the parser is initially expecting the item
0) $S: . stmt <EOF>
However, if the parser is expecting to see a stmt
, it could
be at the beginning of any of the rules for stmt
. Hence the
initial state should include the initial item for
stmt
. (The process of including these additional induced items
is referred to as forming the closure of the state).
State 0
0) $S: . stmt <EOF>
1) stmt: . ID ':=' expr
Now if the parser sees an ID
in state 0, then it can move the
dot past any ID
symbols in state 0. We get a new state; let's
call it State 1:
State 1
1) stmt: ID . ':=' expr
If the parser has seen a stmt
in state 0, then it can move
the dot past any stmt
symbols in state 0. We get a new state;
let's call it State 2:
State 2
0) $S: stmt . <EOF>
If the parser sees a ':='
in state 1, we get a new state; let's
call it State 3:
State 3
1) stmt: ID ':=' . expr
However since the dot is before the nonterminal expr
, the
parser could be in any of the rules for expr
. Hence we need to
include the rules for expr
in a new state 3:
State 3
1) stmt: ID ':=' . expr
2) expr: . expr '+' ID
3) expr: . expr '-' ID
4) expr: . ID
We continue this process of following all possible transitions out of states
until we cannot construct any new states. Completing this construction
results in an automaton called a LR(0) machine. The transitions on
terminal symbols correspond to shift actions in the parser; the
transitions on nonterminal symbols correspond to goto actions in
the parser.
State 0
0) $S: . stmt
The LR(0) machine defines the shift and goto actions in
our parse tables. But what corresponds to the reduce actions in
the action table?