Next: The intersection of a Up: The intersection of Finite Previous: Introduction

# The intersection of a CFG and a FSA

The calculation of the intersection of a CFG and a FSA is very simple [1]. The (context-free) grammar defining this intersection is simply constructed by keeping track of the state names in the non-terminal category symbols. For each rule X0 X1...Xn there are rules X0q0q X1q0q1X2q1q2...Xnqn - 1q, for all q0...qn. Furthermore for each transition (qi,) = qk we have a rule qiqk . Thus the intersection of a FSA and a CFG is a CFG that exactly derives all parse-trees. Such a grammar might be called the parse-forest grammar.

Although this construction shows that the intersection of a FSA and a CFG is itself a CFG, it is not of practical interest. The reason is that this construction typically yields an enormous amount of rules that are useless'. In fact the (possibly enormously large) parse forest grammar might define an empty language (if the intersection was empty). Luckily ordinary' recognizers/parsers for CFG can be easily generalized to construct this intersection yielding (in typical cases) a much smaller grammar. Checking whether the intersection is empty or not is then usually very simple as well: only in the latter case will the parser terminate succesfully.

To illustrate how a parser can be generalized to accept a FSA as input we present a simple top-down parser.

A context-free grammar is represented as a definite-clause specification as follows. We do not wish to define the sets of terminal and non-terminal symbols explicitly, these can be understood from the rules that are defined using the relation rule/2, and where symbols of the rhs are prefixed with -' in the case of terminals and +' in the case of non-terminals. The relation top/1 defines the start symbol. The language L' = anbn is defined as:

top(s).

rule(s,[-a,+s,-b]).  rule(s,[]).


In order to illustrate how ordinary parsers can be used to compute the intersection of a FSA and a CFG consider first the definite-clause specification of a top-down parser. This parser runs in polynomial time if implemented using Earley deduction or XOLDT resolution [14]. It is assumed that the input string is represented by the trans/3 predicate.

parse(P0,P) :-
top(Cat), parse(+Cat,P0,P).

parse(-Cat,P0,P) :-
trans(P0,Cat,P),
side_effect(p(Cat,P0,P) --> Cat).
parse(+Cat,P0,P) :-
rule(Cat,Ds),
parse_ds(Ds,P0,P,His),
side_effect(p(Cat,P0,P) --> His).

parse_ds([],P,P,[]).
parse_ds([H|T],P0,P,[p(H,P0,P1)|His]) :-
parse(H,P0,P1),
parse_ds(T,P1,P,His).


The predicate side_effect is used to construct the parse forest grammar. The predicate always succeeds, and as a side-effect asserts that its argument is a rule of the parse forest grammar. For the sentence a a b b' we obtain the parse forest grammar:

p(s,2,2) --> [].
p(s,1,3) -->
[p(-a,1,2),p(+s,2,2),p(-b,2,3)].
p(s,0,4) -->
[p(-a,0,1),p(+s,1,3),p(-b,3,4)].
p(a,1,2) --> a.
p(a,0,1) --> a.
p(b,2,3) --> b.
p(b,3,4) --> b.

The reader easily verifies that indeed this grammar generates (a isomorphism of) the single parse tree of this example, assuming of course that the start symbol for this parse-forest grammar is p(s,0,4). In the parse-forest grammar, complex symbols are non-terminals, atomic symbols are terminals.

Next consider the definite clause specification of a FSA. We define the transition relation using the relation trans/3. For start states, the relation start/1 should hold, and for final states the relation final/1 should hold. Thus the following FSA, defining the regular language L = (aa)*b+ (i.e. an even number of a's followed by at least one b) is given as:

Interestingly, nothing needs to be changed to use the same parser for the computation of the intersection of a FSA and a CFG. If our input sentence' now is the definition of trans/3 as given above, we obtain the following parse forest grammar (where the start symbol is p(s,q0,q2)):

p(s,q0,q0) --> [].
p(s,q1,q1) --> [].
p(s,q1,q2) -->
[p(-a,q1,q0),p(+s,q0,q0),p(-b,q0,q2)].
p(s,q0,q2) -->
[p(-a,q0,q1),p(+s,q1,q2),p(-b,q2,q2)].
p(s,q1,q2) -->
[p(-a,q1,q0),p(+s,q0,q2),p(-b,q2,q2)].
p(a,q0,q1) --> a.
p(a,q1,q0) --> a.
p(b,q0,q2) --> b.
p(b,q2,q2) --> b.

Thus, even though we now use the same parser for an infinite set of input sentences (represented by the FSA) the parser still is able to come up with a parse forest grammar. A possible derivation for this grammar constructs the following (abbreviated) parse tree in figure 1. Note that the construction of Bar Hillel would have yielded a grammar with 88 rules.

Next: The intersection of a Up: The intersection of Finite Previous: Introduction
Noord G.J.M. van
1998-09-29