Next: The intersection of a Up: On the Intersection of Previous: Introduction

# The intersection of a CFG and a FSA

The calculation of the intersection of a CFG and a FSA is very simple [1]. However this construction yields an enormous amount of rules that are useless'. Furthermore the resulting parse forest grammar might define an empty language (if the intersection was empty). Luckily ordinary' recognizers/parsers for CFG can be easily generalized to construct this intersection (usually yielding a much smaller grammar than in the construction of [1]). Checking whether the intersection is empty or not is then usually very simple as well: only in the latter case will the parser terminate succesfully.

Consider the following definite clause specification of a FSA. The following conventions apply. We define the transition relation using the relation trans/3. For start states, the relation start/1 should hold, and for final states the relation final/1 should hold. Thus the following FSA, defining the regular language L = (aa)*b+ (i.e. an even number of a's followed by at least one b) is given as:

A context-free grammar is represented as a definite-clause specification as follows. We do not wish to define the sets of terminal and non-terminal symbols explicitly, these can be understood from the rules that are defined using the relation rule/2, and where symbols of the rhs are prefixed with -' in the case of terminals and +' in the case of non-terminals. The relation top/1 should hold for the start symbol. The language L' = anbn is defined as:

top(s).

rule(s,[-a,+s,-b]).  rule(s,[]).


In order to illustrate how ordinary parsers can be used to compute the intersection of a FSA and a CFG consider the definite-clause specification of a top-down parser. This parser runs in cubic time if implemented using Earley deduction or XOLDT resolution. It is assumed that the input string is represented by the trans/3 predicate.

parse(P0,P) :-
top(Cat), parse(+Cat,P0,P).

parse(-Cat,P0,P) :-
trans(P0,Cat,P),
side_effect(p(Cat,P0,P) --> Cat).
parse(+Cat,P0,P) :-
rule(Cat,Ds),
parse_ds(Ds,P0,P,His),
side_effect(p(Cat,P0,P) --> His).

parse_ds([],P,P,[]).
parse_ds([H|T],P0,P,[p(H,P0,P1)|His]) :-
parse(H,P0,P1),
parse_ds(T,P1,P,His).


The predicate side_effect is used to construct the parse forest grammar. The predicate always succeeds, and as a side-effect asserts that its argument is a rule of the parse forest grammar. For the sentence a a b b' we obtain the parse forest grammar:

p(s,2,2) --> [].
p(s,1,3) -->
[p(-a,1,2),p(+s,2,2),p(-b,2,3)].
p(s,0,4) -->
[p(-a,0,1),p(+s,1,3),p(-b,3,4)].
p(a,1,2) --> a.
p(a,0,1) --> a.
p(b,2,3) --> b.
p(b,3,4) --> b.

The reader easily verifies that indeed this grammar generates (a isomorphism of) the single parse tree of this example, assuming of course that the start symbol is p(s,0,4). In the parse-forest grammar, complex symbols are non-terminals, atomic symbols are terminals.

Interestingly, nothing needs to be changed to use the same parser for the computation of the intersection of a FSA and a CFG. If our input sentence' now is the definition of trans/3 as given above, we obtain the following parse forest grammar (where the start symbol is p(s,q0,q2)):

p(s,q0,q0) --> [].
p(s,q1,q1) --> [].
p(s,q1,q2) -->
[p(-a,q1,q0),p(+s,q0,q0),p(-b,q0,q2)].
p(s,q0,q2) -->
[p(-a,q0,q1),p(+s,q1,q2),p(-b,q2,q2)].
p(s,q1,q2) -->
[p(-a,q1,q0),p(+s,q0,q2),p(-b,q2,q2)].
p(a,q0,q1) --> a.
p(a,q1,q0) --> a.
p(b,q0,q2) --> b.
p(b,q2,q2) --> b.

Thus, even though we now use the same parser for an infinite set of input sentences (represented by the FSA) the parser still is able to come up with a parse forest grammar. A possible derivation for this grammar is the following (abbreviated) parse tree:

Next: The intersection of a Up: On the Intersection of Previous: Introduction
Noord G.J.M. van
1998-09-29