The probability of a parse y given a sentence x might then be
Due to the occurrence of reentrancies, dependency structures are
generally not trees but graphs. Therefore, the product above
gives poor results because it will have an unjustified bias against
such reentrancies (a reentrancy gives rise to an additional dependency
relation). For this reason, we have chosen to score parse trees by
determining the mean value of for each tuple; this
improved results considerably. The probability of a dependency is
calculated as follows:
The three components are each calculated using a linear back-off strategy, where the weights are determined by frequency and diversity (formula 2.66 of ). The quantities we use for backing off are given in the following table:
Because the size of the treebanks we have currently available is much too small to estimate these quantities accurately, we have chosen to do our estimation using unsupervised learning. We have parsed a large corpus (`de Volkskrant' newspaper text: first four months of 1997) using the penalty rules described in the previous section as our disambiguator. This corpus contains about 350,000 sentences and 6,200,000 words. We only used those sentences that the system could analyse as a single constituent, and within a reasonable amount of time. This meant that we could use the results of about 225,000 sentences. We estimated the quantity p using the best parse (according to the penalty rules) for each of these sentences. Collecting the 225,000 dependency structures took about one month of CPU-time (using the high-performance computing cluster of the University of Groningen).
As can be concluded from table 3, such a model performs much better than the baseline. Moreover, a combined model in which we simply add the rule penalties to the quantity p performs better than either model in isolation.