The calculation of the intersection of a CFG and a FSA is very simple [1]. However this construction yields an enormous amount of rules that are `useless'. Furthermore the resulting parse forest grammar might define an empty language (if the intersection was empty). Luckily `ordinary' recognizers/parsers for CFG can be easily generalized to construct this intersection (usually yielding a much smaller grammar than in the construction of [1]). Checking whether the intersection is empty or not is then usually very simple as well: only in the latter case will the parser terminate succesfully.

Consider the following definite clause specification of a FSA. The
following conventions apply. We define the transition relation using
the relation `trans/3`. For start states, the relation start/1
should hold, and for final states the relation final/1 should hold.
Thus the following FSA, defining the regular language
*L* = (*aa*)^{*}*b*^{+}
(i.e. an even number of a's followed by at least one b) is given as:

A context-free grammar is represented as a definite-clause
specification as follows. We do not wish to define the sets of
terminal and non-terminal symbols explicitly, these can be understood
from the rules that are defined using the relation `rule/2`, and
where symbols of the rhs are prefixed with `-' in the case of
terminals and `+' in the case of non-terminals. The relation ` top/1` should hold for the start symbol. The language *L'* = *a*^{n}*b*^{n} is
defined as:

top(s). rule(s,[-a,+s,-b]). rule(s,[]).

In order to illustrate how ordinary parsers can be used to compute the
intersection of a FSA and a CFG consider the definite-clause
specification of a top-down parser. This parser runs in cubic time if
implemented using Earley deduction or XOLDT resolution. It is assumed
that the input string is represented by the `trans/3` predicate.

parse(P0,P) :- top(Cat), parse(+Cat,P0,P). parse(-Cat,P0,P) :- trans(P0,Cat,P), side_effect(p(Cat,P0,P) --> Cat). parse(+Cat,P0,P) :- rule(Cat,Ds), parse_ds(Ds,P0,P,His), side_effect(p(Cat,P0,P) --> His). parse_ds([],P,P,[]). parse_ds([H|T],P0,P,[p(H,P0,P1)|His]) :- parse(H,P0,P1), parse_ds(T,P1,P,His).

The predicate `side_effect` is used to construct the parse forest
grammar. The predicate always succeeds, and as a side-effect asserts
that its argument is a rule of the parse forest grammar. For the
sentence `a a b b' we obtain the parse forest grammar:

p(s,2,2) --> []. p(s,1,3) --> [p(-a,1,2),p(+s,2,2),p(-b,2,3)]. p(s,0,4) --> [p(-a,0,1),p(+s,1,3),p(-b,3,4)]. p(a,1,2) --> a. p(a,0,1) --> a. p(b,2,3) --> b. p(b,3,4) --> b.The reader easily verifies that indeed this grammar generates (a isomorphism of) the single parse tree of this example, assuming of course that the start symbol is

Interestingly, nothing needs to be changed to use the same
parser for the computation of the intersection of a FSA and a CFG. If
our input `sentence' now is the definition of `trans/3` as given
above, we obtain the following parse forest grammar (where the start
symbol is `p(s,q0,q2)`):

p(s,q0,q0) --> []. p(s,q1,q1) --> []. p(s,q1,q2) --> [p(-a,q1,q0),p(+s,q0,q0),p(-b,q0,q2)]. p(s,q0,q2) --> [p(-a,q0,q1),p(+s,q1,q2),p(-b,q2,q2)]. p(s,q1,q2) --> [p(-a,q1,q0),p(+s,q0,q2),p(-b,q2,q2)]. p(a,q0,q1) --> a. p(a,q1,q0) --> a. p(b,q0,q2) --> b. p(b,q2,q2) --> b.Thus, even though we now use the same parser for an infinite set of input sentences (represented by the FSA) the parser still is able to come up with a parse forest grammar. A possible derivation for this grammar is the following (abbreviated) parse tree:

1998-09-29