next up previous
Next: Disambiguation Up: Robust Parsing Previous: Creating Parse Forests.

Unpacking and Parse Selection.

The motivation to construct a parse forest is efficiency: the number of parse trees for a given sentence can be enormous. In addition to this, in most applications the objective will not be to obtain all parse trees, but rather the best parse tree. Thus, the final component of the parser consists of a procedure to select these best parse trees from the parse forest.

In order to select the best parse tree from a parse forest, we assume a parse evaluation function which assigns a score to each parse. In section 6 we describe some initial experiments with a variety of parse evaluation functions. A naive algorithm constructs all possible parse trees, assigns each one a score, and then selects the best one. Since it is too inefficient to construct all parse trees, we have implemented the algorithm which computes parse trees from the parse forest as a best-first search. This requires that the parse evaluation function is extended to partial parse trees. In order to be able to guarantee that this search procedure indeed finds the best parse tree, a certain monotonicity requirement should apply to this evaluation function: if a (partial) tree s is better than s', then a tree t which contains s should be better than t' which is just like t except it has s' instead of s. However, instead of relying on such a requirement, we implemented a variant of a best-first search algorithm in such a way that for each state in the search space, we maintain the b best candidates, where b is a small integer (the beam). If the beam is decreased, then we run a larger risc of missing the best parse (but the result will typically still be a relatively `good' parse); if the beam is increased, then the amount of computation increases too. Currently, we find that a value of b=4 is a good compromise between accuracy and efficiency. In table 2 the effect of various values for b is presented for two development treebanks. The grammar assigns on average about 33 parse trees per sentence for the cdbl10 corpus. This number increases rapidly for longer sentences: for the cdbl20 corpus it is at least 340.5

Table 2: Effect of beam-size on accuracy and efficiency of parse selection
beam cdbl10 cdbl20
  accuracy (%) speed (msec) accuracy (%) speed (msec)
1 79.99 190 73.63 740
2 80.66 270 74.59 1470
4 81.11 350 75.07 2350
8 81.22 530 75.35 3630
16 81.36 590 75.31 5460
32 81.36 790 74.98 7880
$\infty$ 81.36 640 - -

next up previous
Next: Disambiguation Up: Robust Parsing Previous: Creating Parse Forests.
Noord G.J.M. van