next up previous
Next: Acknowledgments Up: The intersection of a Previous: Intersection of FSA and


What to do?

The following approaches towards the undecidability problem can be taken:

These approaches are discussed now in turn.

Limit the FSA

Rather than assuming the input for parsing is a FSA in its full generality, we might assume that the input is an ordinary word graph (a FSA without cycles).

Thus the techniques for robust processing that give rise to such cycles cannot be used. One example is the processing of an unknown sequence of words, e.g. in case there is noise in the input and it is not clear how many words have been uttered during this noise. It is not clear to me right now what we loose (in practical terms) if we give up such cycles.

Note that it is easy to verify that the question whether the intersection of a word-graph and an off-line parsable DCG is empty or not is decidable since it reduces to checking whether the DCG derives one of a finite number of strings.

Limit the DCG

Another approach is to limit the size of the categories that are being employed. This is the GPSG and F-TAG approach. In that case we are not longer dealing with DCGs but rather with CFGs (which have been shown to be insufficient in general for the description of natural languages).

Compromise completeness

Completeness in this context means: the parse forest grammar contains all possible parses. It is possible to compromise here, in such a way that the parser is guaranteed to terminate, but sometimes misses a few parse-trees.

For example, if we assume that each edge in the FSA is associated with a probability it is possible to define a threshold such that each partial result that is derived has a probability higher than the threshold. Thus, it is still possible to have cycles in the FSA, but anytime the cycle is `used' the probability decreases and if too many cycles are encountered the threshold will cut off that derivation.

Of course this implies that sometimes the intersection is considered empty by this procedure whereas in fact the intersection is not. For any threshold it is the case that the intersection problem of off-line parsable DCGs and FSA is decidable.

Compromise soundness

Soundness in this context should be understood as the property that all parse trees in the parse forest grammar are valid parse trees. A possible way to ensure termination is to remove all constraints from the DCG and parse according to this context-free skeleton. The resulting parse-forest grammar will be too general most of the times.

A practical variation can be conceived as follows. From the DCG we take its context-free skeleton. This skeleton is obtained by removing the constraints from each of the grammar rules. Then we compute the intersection of the skeleton with the input FSA. This results in a parse forest grammar. Finally, we add the corresponding constraints from the DCG to the grammar rules of the parse forest grammar.

This has the advantage that the result is still sound and complete, although the size of the parse forest grammar is not optimal (as a consequence it is not guaranteed that the parse forest grammar contains a parse tree). Of course it is possible to experiment with different ways of taking the context-free skeleton (including as much information as possible / useful).

next up previous
Next: Acknowledgments Up: The intersection of a Previous: Intersection of FSA and
Noord G.J.M. van