Consider a computer system that you can call by phone, and that you can ask questions -- in ordinary Dutch -- concerning the time table of public transportation. For example, you might ask such a system the departure time of a train from Groningen to Amsterdam, such that you can arrive in Amsterdam at 10. Before proceeding to answer your query such a system will probably ask you at which date you want to travel, and whether you mean `10 in the morning' or `10 in the evening'. After such a clarification, it will then proceed to give you the desired information, e.g. that you could take the 7.37 train from platform 2b.
Such a system is capable of recognizing and understanding spoken language. It is furthermore capable of conducting a natural dialogue in order to obtain the information necessary for answering your questions. Moreover, the system is capable of producing spoken natural language in order to conduct this dialogue, and in order to produce the answer to your request for information. Such a system does not yet exist.
The NWO Priority Programme Language and Speech Technology is a research programme aiming at the development of spoken language information systems. Its practical goal is to develop a demonstrator of a system we just described. This demonstrator is called OVIS, Openbaar Vervoer Informatie Systeem (Public Transport Information System). In order to meet this practical goal, scientific contributions are envisaged in speech recognition, natural language processing, and dialogue management.
Apart from the practical goal of building the OVIS demonstrator, the NWO Priority Programme Language and Speech Technology is motivated by a number of scientific, cultural and economical goals.
An important scientific goal is to combine the insights of linguistics and phonetics on the one hand, and computer science on the other. It is expected that the technological requirements of the proposed demonstrator necessitates a closer study of language `performance' rather than language `competence'; therefore information-theoretic approaches (such as probabilistic techniques) will be investigated.
Furthermore, two -- quite different -- cultural goals are identified. Firstly, the programme aims at an integration of two different research paradigms, by confronting the experimental, corpus-based approach to language research (prominent in speech technology) with the `knowledge-based' approaches in linguistic research and language technology.
The second cultural goal is to maintain the status of Dutch as a cultural language. It is expected that the status of Dutch will decrease if spoken language technology is developed to a further extent for languages such as English and German than it is for Dutch.
Finally, the Priority Programme is of economic importance. Without research into Dutch language technology it is not possible to develop Dutch information systems. The programme will support engineers from industry to bridge the gap between scientific research and commercial Dutch information processing systems.
The NWO Priority Programme `Language and Speech Technology' is a five-year research programme. The programme is co-funded by Philips Corporate Research and KPN Research. Research is carried out by four different groups. Work on speech recognition is carried out at the University of Nijmegen; work on probabilistic natural language processing is carried out by a group at the University of Amsterdam; work on the pragmatic module and the dialogue manager is carried out by a group at the Institute for Perception Research at Eindhoven. Finally the work on grammar-based natural language processing is carried out by the authors of this paper, at the University of Groningen. The remainder of this paper will concentrate on the work on grammar-based natural language processing.
In the next section we will describe the architecture of the OVIS2 system. OVIS2 is a modularized version of the OVIS1 prototype. The OVIS1 prototype is a version of a German system developed by Philips Dialogue Systems in Aachen [21,2], adapted to Dutch. In section 1.3. we then describe the problems that need to be solved in the natural language understanding module. The paper then continues by describing the components that are currently under construction as part of the natural language understanding module.
The architecture of the OVIS2 prototype is given in figure 1. In this overview of the architecture, the arrows indicate the flow of information. Boxes are modules of the system. The idea is that the user produces an utterance which is input for the speech recognition module. The result of speech recognition is passed on to the linguistic processing component. The result of linguistic analysis is input for the pragmatic component, which passes on data to the dialogue manager. The dialogue manager checks whether enough information is available to consult the database, or whether further information needs to be requested from the user. Such requests and the answers to the user's query are articulated by means of the synthesis modules. The circled boxes in the center of the figure illustrate the datastructures that are maintained by the different modules. For the motivation behind the chosen architecture we refer to  and .
In this paper we concentrate on the linguistic processing (NLP) component. This component receives its input from the speech recognizer, and passes the result of linguistic analysis -- some kind of semantic representation -- on to the pragmatic interpretation component.
The output of the speech recognizer is a word graph (cf. section 3.2.), which represents all different hypotheses for the spoken input from the user. Linguistic analysis of such a structure is more complicated than analysis for a single input sentence (cf. linguistic analysis for written input), as we will show later.
The interface between the linguistic analysis component and the pragmatic interpretation module is described in . This interface is defined by a special interface language, called the Update Language. Some examples will be provided later.
In the design of the linguistic processing component, the following problems must be addressed.
Firstly, a natural language analysis component will be faced with ambiguity. The combination of lexical and structural ambiguities often leads to an enormous amount of possible readings for an input utterance. In OVIS2 this problem is even more acute because of the use of word graphs. Techniques will have to be developed to be able to deal with large numbers of analyses, and to be able to choose the most appropriate reading from such a set of candidate analyses.
While the system must somehow deal with large numbers of analyses, many of which may eventually turn out to be useless, it must not make the mistake of overlooking useful analyses. This is the problem of robustness.
The linguistic analysis needs to be robust for three reasons. Firstly, it is quite difficult to anticipate in the grammar all linguistic constructions that might occur. This is the traditional problem for grammar-based NLP. Secondly, spoken language is full of hesitations, corrections, false starts etc. which are not always easy to detect. The third reason is that the utterance that was actually spoken is not guaranteed to be a path in the word graph, due to limitations of state-of-the-art speech recognition.
These observations indicate that robustness and disambiguation are two very important problems to be solved in the NLP component. A third problem can be added to this list: efficiency. Given the nature of the proposed application, it will be clear that the system is supposed to run in real-time, i.e. it is not supposed to leave the user waiting for the requested information. This requirement provides a further challenge for the NLP component.
We currently assume that the following phases can be distinguished in the NLP module. Firstly, we foresee a pre-processing phase in which the word graph may be modified in order to repair hesitations, corrections, false starts, etc.
Secondly, this modified word graph is then input for the syntactic and semantic analysis phase. The grammar defines what the appropriate syntactic and semantic analysis should be for a given input. The parser, which is a device derived from the grammar, is capable of computing this analysis. The grammar and the parser are therefore two important components, and will be discussed in more detail in the remainder of this paper.
Finally, in case semantic analysis results in either too few or too many analyses, special action should be undertaken. If no semantic analysis was found, then the robustness component tries to come up with a partial semantic analysis on the basis of the partial results that were discovered by the parser. If many competing semantic analyses were produced then the disambiguation component should decide which analysis is the most appropriate candidate.