Next: Complications for Ngrams Up: Robust parsing of word-graphs Previous: Example: nlp_speech method.

## Search algorithm

The robustness component can be characterised as a search in the annotated word-graph. The goal of the search is the best path from v0 to vn+1. This search reduces to a well-known graph search problem, namely the problem of finding the shortest path in a directed acyclic graph with uniform weights.

We use a variant of the DAG-SHORTEST-PATH algorithm [18]. This algorithm finds shortest paths for uniformly weighted directed acyclic graphs. The first step of the algorithm is a topological sort of the vertices of the graph. It turns out that the state names of the word-graph that we obtain from the speech recogniser are already topologically sorted: state names are integers, and edges always connect to larger integers. The second step of the algorithm maintains an array A which records for each state vk the weight associated with the best path known from v0 to vk. A similar array, P, is used to represent for each state the history of this best path, as a sequence of QLFs (since that is what we want to obtain eventually).

The first step of the algorithm initialises these arrays such that for each state , and . For v0 we specify and . After this initialisation phase the algorithm treats each edge of the graph in topologically sorted order of the source vertex, as follows:

 (34)

The function relax is defined on edges and updates the arrays if a better path to a vertex has been found:

 (35)

When the algorithm finishes, P[vn+1] constitutes the sequence of QLFs associated with the best path found. The weight of this path is given by A[vn+1]. This algorithm is efficient. Its running time is O(V+E), where V is the number of vertices and E is the number of edges. Therefore, it can be expected that this part of processing should not decrease parsing efficiency too much, since the number of edges is O(V2).8 For a more detailed account of the correctness and complexity of this algorithm, see Cormen, Leiserson, and Rivest [18]. 9

A simple generalisation of the algorithm has been implemented in order to obtain the N best solutions. In this case we maintain in the algorithm for each vertex the N best paths found so far. Such a generalisation increases the worst-case complexity by only a constant factor, and is very useful for development work.

Next: Complications for Ngrams Up: Robust parsing of word-graphs Previous: Example: nlp_speech method.

2000-07-10