The calculation of the intersection of a CFG and a FSA is very simple
[1]. The (context-free) grammar defining this
intersection is simply constructed by keeping track of the state names
in the non-terminal category symbols. For each rule
*X*_{0}
*X*_{1}...*X*_{n} there are rules
*X*_{0}*q*_{0}*q*
*X*_{1}*q*_{0}*q*_{1}*X*_{2}*q*_{1}*q*_{2}...*X*_{n}*q*_{n - 1}*q*, for all
*q*_{0}...*q*_{n}. Furthermore for each
transition
(*q*_{i},) = *q*_{k} we have a rule
*q*_{i}*q*_{k}
. Thus the intersection of a FSA and a
CFG is a CFG that exactly derives all parse-trees. Such a grammar
might be called the parse-forest grammar.

Although this construction shows that the intersection of a FSA and a
CFG is itself a CFG, it is not of practical interest. The reason is
that this construction typically yields an enormous amount of rules
that are `useless'. In fact the (possibly enormously large) parse
forest grammar might define an empty language (if the intersection was
empty). Luckily `ordinary' recognizers/parsers for CFG can be easily
generalized to construct this intersection yielding (in typical cases)
a much smaller grammar. Checking whether the intersection is empty or
not is then usually very simple as well: only in the latter case will
the parser terminate succesfully.

To illustrate how a parser can be generalized to accept a FSA as input we present a simple top-down parser.

A context-free grammar is represented as a definite-clause
specification as follows. We do not wish to define the sets of
terminal and non-terminal symbols explicitly, these can be understood
from the rules that are defined using the relation `rule/2`, and
where symbols of the rhs are prefixed with `-' in the case of
terminals and `+' in the case of non-terminals. The relation ` top/1` defines the start symbol. The language *L'* = *a*^{n}*b*^{n} is
defined as:

top(s). rule(s,[-a,+s,-b]). rule(s,[]).

In order to illustrate how ordinary parsers can be used to compute the
intersection of a FSA and a CFG consider first the definite-clause
specification of a top-down parser. This parser runs in polynomial
time if implemented using Earley deduction or XOLDT resolution
[14]. It is assumed that the input string is represented
by the `trans/3` predicate.

parse(P0,P) :- top(Cat), parse(+Cat,P0,P). parse(-Cat,P0,P) :- trans(P0,Cat,P), side_effect(p(Cat,P0,P) --> Cat). parse(+Cat,P0,P) :- rule(Cat,Ds), parse_ds(Ds,P0,P,His), side_effect(p(Cat,P0,P) --> His). parse_ds([],P,P,[]). parse_ds([H|T],P0,P,[p(H,P0,P1)|His]) :- parse(H,P0,P1), parse_ds(T,P1,P,His).

The predicate `side_effect` is used to construct the parse forest
grammar. The predicate always succeeds, and as a side-effect asserts
that its argument is a rule of the parse forest grammar. For the
sentence `a a b b' we obtain the parse forest grammar:

p(s,2,2) --> []. p(s,1,3) --> [p(-a,1,2),p(+s,2,2),p(-b,2,3)]. p(s,0,4) --> [p(-a,0,1),p(+s,1,3),p(-b,3,4)]. p(a,1,2) --> a. p(a,0,1) --> a. p(b,2,3) --> b. p(b,3,4) --> b.The reader easily verifies that indeed this grammar generates (a isomorphism of) the single parse tree of this example, assuming of course that the start symbol for this parse-forest grammar is

Next consider the definite clause specification of a FSA.
We define the transition relation using
the relation `trans/3`. For start states, the relation start/1
should hold, and for final states the relation final/1 should hold.
Thus the following FSA, defining the regular language
*L* = (*aa*)^{*}*b*^{+}
(i.e. an even number of a's followed by at least one b) is given as:

Interestingly, nothing needs to be changed to use the same
parser for the computation of the intersection of a FSA and a CFG. If
our input `sentence' now is the definition of `trans/3` as given
above, we obtain the following parse forest grammar (where the start
symbol is `p(s,q0,q2)`):

p(s,q0,q0) --> []. p(s,q1,q1) --> []. p(s,q1,q2) --> [p(-a,q1,q0),p(+s,q0,q0),p(-b,q0,q2)]. p(s,q0,q2) --> [p(-a,q0,q1),p(+s,q1,q2),p(-b,q2,q2)]. p(s,q1,q2) --> [p(-a,q1,q0),p(+s,q0,q2),p(-b,q2,q2)]. p(a,q0,q1) --> a. p(a,q1,q0) --> a. p(b,q0,q2) --> b. p(b,q2,q2) --> b.Thus, even though we now use the same parser for an infinite set of input sentences (represented by the FSA) the parser still is able to come up with a parse forest grammar. A possible derivation for this grammar constructs the following (abbreviated) parse tree in figure 1. Note that the construction of Bar Hillel would have yielded a grammar with 88 rules.

1998-09-29