The NWO Priority Programme Language and Speech Technology is a 5-year research programme aiming at the development of spoken language information systems. Its immediate goal is to develop a demonstrator of a public transport information system, which operates over ordinary telephone lines. This demonstrator is called OVIS, Openbaar Vervoer Informatie Systeem ( Public Transport Information System). The language of the system is Dutch.
In this Programme, two alternative NLP modules are developed in parallel: a grammar-based (conventional, rule-based) module and a data-oriented (memory-based, stochastic, DOP) module. Both of these modules fit into the system architecture of OVIS. They accept as their input word graphs produced by the automatic speech recognition component, and produce updates which are passed on to the pragmatic analysis component and dialogue manager.
A word graph (Oerder and Ney, 1993) is a compact representation for all sequences of words that the speech recogniser hypothesises for a spoken utterance. The states of the graph represent points in time, and a transition between two states represents a word that may have been uttered between the corresponding points in time. Each transition is associated with an acoustic score representing a measure of confidence that the word perceived there was actually uttered.
The dialogue manager maintains an information state to keep track of the information provided by the user. An update expression is an instruction for updating the information state. The syntax and semantics of such updates are defined in Veldhuijzen van Zanten (1996). The sentence:
is translated into the update expression:
(user.wants. (destination.(place.groningen); (origin.(place.amsterdam)); (moment.at.(date.(month.february;day.4))))which indicates that the destination and origin slots can be filled in, as well as the moment.at slot.
In order to compare the NLP modules, a formal evaluation has been carried out three years after the start of the Programme. In this paper, we first shortly describe the two competing NLP components in section 2. The evaluation measures string accuracy, semantic accuracy and computational resources. This is described in more detail in section 3. The evaluation results are presented in section 4. On the basis of these results some conclusions are drawn in section 5.