Next: Best-first methods Up: Robust parsing of word-graphs Previous: Example: nlp_speech_trigram method.

## Comparison with Dijkstra's algorithm

In a previous version of the implementation we used a generalised version of DIJKSTRA's algorithm [20], [32], [18], instead of the DAG-SHORTEST-PATH presented above. Dijkstra's algorithm is more general in that it is not restricted to acyclic graphs. On the other hand, however, Dijkstra's algorithm requires that weights on edges are positive (paths can only get worse if they are extended). A potential advantage of Dijkstra's algorithm for our purposes is that the algorithm often does not have to investigate all edges. If edges are relatively expensive to compute, then Dijkstra's algorithm might turn out to be faster.

For instance, we can obtain a modest increase in efficiency by exploiting Dijkstra's algorithm if we delay some of the work the parser has to do for some category q, until the robustness component actually has a need for that category q. Since Dijkstra's algorithm will not visit every q, the amount of work is reduced. We exploited this in our implementation as follows. The parser works in two phases. In the first phase (recognition) no semantic constraints are taken into account (in order to pack all ambiguities). In the second phase semantic constraints are applied. This second phase can then be delayed for some category q until Dijkstra's algorithm uses an edge based on q. For a number of categories, therefore, this second phase can be skipped completely.

However, we had three reasons for preferring the DAG-SHORTEST-PATH algorithm given above. Firstly, this algorithm is simpler than Dijkstra's algorithm. Secondly, negative weights do show up in a number of circumstances. And thirdly, the expected efficiency gain was not observed in practice.

An example where negative weights may show up is the following. Suppose we define a method which takes into account Ngram scores but nothing else, i.e. all other sources of information such as acoustic scores are ignored. It turns out that a straightforward implementation of such a method is non-optimal since it will favour paths in the word-graph consisting of a small number of long words over paths (of the same duration) consisting of a larger number of smaller words, only because more scores have to be added. A simple and effective way to eliminate this effect, is to subtract a constant from each score. However, this subtraction may yield negative numbers.

Next: Best-first methods Up: Robust parsing of word-graphs Previous: Example: nlp_speech_trigram method.

2000-07-10