For English, tremendous progress has been made in the area of wide-coverage parsing of unrestricted text. Many of the proposed systems are statistical parsers, but systems based on a hand-written grammar exist as well. The aim of Alpino1is to provide computational analysis of Dutch with coverage and accuracy comparable to state-of-the-art parsers for English.
The Alpino grammar (described in more detail below) is a lexicalized grammar in the tradition of constructionalist Head-driven Phrase Structure Grammar [15,17]. The grammar consists of hand-written, linguistically motivated rules and lexical types. To evaluate the coverage and disambiguation component of the system, a testbench of syntactically annotated material is absolutely crucial. Given the current lack of such material for Dutch, we have started to annotate corpora with dependency structures. Dependency structures provide a convenient level of representation for annotation, and a fairly neutral representation for further processing. The annotation format is taken from the project Corpus Gesproken Nederlands ( Corpus of Spoken Dutch) . The construction of dependency structures in the grammar and our treebanking efforts are described in section 4. Both the lexicalist nature of the Alpino grammar and the use of dependency structures imply that lexical items must be associated with detailed valency information. For the Alpino lexicon we have extracted this information from the Celex and Parole lexical databases (section 3).
In section 5 we describe Alpino's parsing architecture. Section 6 describes a variety of disambiguation strategies which have been integrated in Alpino. In addition, we report on a number of preliminary disambiguation experiments. We conclude with some remarks on future work.