In this paper we are concerned with the syntactic analysis phase of a
natural language understanding system. Ordinarily, the input of such a
system is a sequence of words. However, following Bernard Lang we
argue that it might be fruitful to take the input more generally
as a *finite state automaton (FSA)* to model cases in which we are
uncertain about the actual input. Parsing uncertain input might be
necessary in case of ill-formed textual input, or in case of speech input.

For example, if a natural language understanding system is interfaced
with a speech recognition component, chances are that this compenent
is uncertain about the actual string of words that has been uttered,
and thus produces a *word lattice* of the most promising
hypotheses, rather than a single sequence of words. FSA of course
generalizes such word lattices.

As another example, certain techniques to deal with ill-formed input can be characterized as finite state transducers [8]; the composition of an input string with such a finite state transducer results in a FSA that can then be input for syntactic parsing. Such an approach allows for the treatment of missing, extraneous, interchanged or misused words [13,12,9].

Such techniques might be of use both in the case of written and spoken language input. In the latter case another possible application concerns the treatment of phenomena such as repairs [3].

Note that we allow the input to be a full FSA (possibly including cycles, etc.) since some of the above-mentioned techniques indeed result in cycles. Whereas an ordinary word-graph always defines a finite language, a FSA of course can easily define an infinite number of sentences. Cycles might emerge to treat unknown sequences of words, i.e. sentences with unknown parts of unknown lengths [7].

As suggested by an ACL reviewer, one could also try to model haplology
phenomena (such as the 's in English sentences like `The chef at Joe's
hat', where `Joe's' is the name of a restaurant) using a finite state
transducer. In a straightforward approach this would also lead to a
finite-state automaton with cycles.

It can be shown that the computation of the intersection of a FSA
and a CFG requires only a minimal generalization of existing parsing
algorithms. We simply replace the usual string positions with the
names of the states in the FSA. It is also straightforward to
show that the complexity of this process is cubic in the number of
states of the FSA (in the case of ordinary parsing the number of
states equals *n* + 1) [6,2] (assuming the
right-hand-sides of grammar rules have at most two categories).

In this paper we investigate whether the same techniques can be
applied in case the grammar is a constraint-based grammar rather
than a CFG. For specificity we will take the grammar to be a
*Definite Clause Grammar* (DCG) [10]. A DCG is a simple
example of a family of constraint-based grammar formalisms that are widely
used in natural language analysis (and generation).
The main findings of this paper can be extended to other members
of that family of constraint-based grammar formalisms.

1998-09-29