In this paper I discuss in full detail the implementation of the head-corner parser. But first I describe the motivations for this approach. I will start by considerations that lead to the choice of a head-driven parser; I will then argue why Prolog is an appropriate language for the implementation of the head-corner parser.
Lexicalist grammar formalisms, such as Head-driven Phrase Structure Grammar (HPSG), have two characteristic properties. Lexical elements and phrases are associated with categories that have considerable internal structure. Second, instead of construction specific rules, a small set of generic rule schemata is used. Consequently, the set of constituent structures defined by a grammar cannot be `read off' the rule set directly, but is defined by the interaction of the rule schemata and the lexical categories. Applying standard parsing algorithms to such grammars is unsatisfactory for a number of reasons. Earley parsing is intractable in general, as the rule set is simply too general. For some grammars, naive top-down prediction may even fail to terminate.  therefore proposes a modified version of the Earley-parser, using restricted top-down prediction. While this modification often leads to better termination properties of the parsing method, in practice it easily leads to a complete trivialization of the top-down prediction step, thus leading to inferior performance.
Bottom-up parsing is far more attractive for lexicalist formalisms, as it is driven by the syntactic information associated with lexical elements. Certain inadequacies remain, however. Most importantly, the selection of rules to be considered for application may not be very efficient. Consider, for instance, the following DCG rule:
Several authors have suggested parsing algorithms that may be more suitable for lexicalist grammars.  discusses the concept of head-driven parsing. The key idea is that the linguistic concept head can be used to obtain parsing algorithms which are better suited for typical natural language grammars. Most linguistic theories assume that among the daughters introduced by a rule there is one daughter which can be identified as the head of that rule. There are several criteria for deciding which daughter is the head. Two of these criteria seem relevant for parsing. First of all, the head of a rule determines to a large extent what other daughters may or must be present, as the head selects the other daughters. Second, the syntactic category and morphological properties of the mother node are, in the default case, identical to the category and morphological properties of the head daughter. These two properties suggest that it may be possible to design a parsing strategy in which one first identifies a potential head of a rule, before starting to parse the non-head daughters. By starting with the head, important information about the remaining daughters is obtained. Furthermore, since the head is to a large extent identical to the mother category, effective top-down identification of a potential head should be possible.
In  two different head-driven parsers are presented. First, a `head-driven' shift-reduce parser is presented which differs from a standard shift-reduce parser in that it considers the application of a rule (i.e. a reduce step) only if a category matching the head of the rule has been found. Furthermore, it may shift elements onto the parse-stack which are similar to the active items (or `dotted rules') of active chart parsers. By using the head of a rule to determine whether a rule is applicable, the head-driven shift-reduce parser avoids the disadvantages of parsers in which either the leftmost or rightmost daughter is used to drive the selection of rules. Kay also presents a `head-corner' parser. The striking property of this parser is that it does not parse a phrase from left to right, but instead operates `bidirectionally'. It starts by locating a potential head of the phrase and then proceeds by parsing the daughters to the left and the right of the head. Again, this strategy avoids the disadvantages of parsers in which rule selection is uniformly driven by either the leftmost or rightmost daughter. Furthermore, by selecting potential heads on the basis of a `head-corner table' (comparable to the left-corner table of a left-corner parser) it may use top-down filtering to minimize the search-space. This head-corner parser generalizes the left-corner parser. Kay's presentation is reminiscent of the left-corner parser as presented by  which itself is a version without memoization of the BUP parser .
Head-corner parsing has also been considered elsewhere. In , ,  and  chart-based head-corner parsing for context-free grammar is considered. It is shown that, in spite of the fact that bidirectional parsing seemingly leads to more overhead than left-to-right parsing, the worst-case complexity of a head-corner parser does not exceed that of an Earley parser. Some further variations are discussed in .
In  and  I argue that head-corner parsing is especially useful for parsing with non-concatenative grammar formalisms. In  and  head-driven parsing strategies for Lexicalized Tree Adjoining Grammars are presented.
The head-corner parser is closely related to the semantic-head-driven generation algorithm (cf.  and references cited there), especially in its purely bottom-up incarnation.
This may come as a surprise at first. Common wisdom is that although small grammars may be succesfully treated with a backtracking parser, larger grammars for natural languages always require the use of a datastructure such as a chart or a table of items to make sure that each computation is only performed once. In the case of constraint-based grammars, however, the cost associated with maintaining such a chart should not be under-estimated. The memory requirements for an implementation of the Earley parser for a constraint-based grammar are often outrageous. Similarly, in an Earley deduction system too much effort may be spent on small portions of computation which are inexpensive to (re-)compute anyway.
For this reason I will argue for an implementation of the head-corner parser in which only large chunks of computation are memoized. In linguistic terms, I will argue for a model in which only maximal projections are memoized. The computation that is carried out in order to obtain such a `chunk' uses a depth-first backtrack search procedure. This solution dramatically improves upon the (average case) memory requirements of a parser; moreover it also leads to an increase in (average case) time efficiency, especially in combination with goal-weakening, because of the reduced overhead associated with the administration of the chart. In each of the experiments discussed in section 7 the use of selective memoization with goal weakening out-performs standard chart-parsers.
Prolog is a particularly useful language for the implementation of a head-corner parser for constraint-based grammars. This is due to the following:
The first consideration does not deserve much further attention. We want to exploit the fact that the primary datastructures of constraint-based grammars and the corresponding information-combining operation can be modelled by Prolog's first order terms and unification.
As was argued above, Prolog backtracking is not used to simulate an iterative procedure to build up a chart via side-effects. On the contrary, Prolog backtracking is used truly for search. Of course, in order to make this approach feasible, certain well-chosen search-goals are memoized. This is clean and logically well-defined (consider, for example, ), even if our implementation in Prolog uses extra-logical predicates.
The third consideration is relevant only if we are interested in robust parsing. In certain methods in robust parsing we are interested in the partial results obtained by the parser. In order to make sure that a parser is complete with respect to such partial results, it is often assumed that a parser must be applied that works exclusively bottom-up. In section 6 it will be shown that the head-corner parser (which uses a mixture of bottom-up and top-down processing) can be applied in a similar fashion by using underspecification in the top-goal. Clearly, underspecification is a concept that arises naturally in Prolog.
The fact that Prolog is a high-level language has a number of practical advantages related to the speed of development. A further advantage is obtained because techniques such as partial evaluation can be applied. For example, I have succesfully applied the Mixtus partial evaluator  to the head-corner parser discussed below, to obtain an additional 20% speed increase. In languages such as C partial evaluation does not seem to be possible because the low-levelness of the language makes it impossible to recognize the concepts that are required.
A head-corner parser for a grammar in which for each rule the left-most daughter is considered to be the head, will effectively function as a left-corner parser. In such cases the head-corner parser can be said to run in `left-corner mode'. Of course, in a left-corner parser certain simplifications are possible. Based on the experiments discussed in section 7, it can be concluded that a specialized left-corner parser is only about 10% faster than a head-corner parser running in left-corner mode. This is an interesting result: it implies that if a head-corner parser is used, you can do at least (almost) as good as a left-corner parser, and, as some of the experiments indicate, often better.
In the context of the OVIS system, it is important that the parser can deal with input from the speech recognizer. The interface between the speech recognizer and the parser consists of word-graphs. In section 5 I show how the head-corner parser is generalized to deal with word-graphs.
Moreover, the nature of the application also dictates that the parser proceeds in a robust way. In section 6 I discuss the OVIS Robustness component, and I show that the use of a parser which includes top-down prediction is not an obstacle towards robustness.
In section 7 we compare the head-corner parser with the other parsers implemented in the Programme for the OVIS application. It will be shown that the head-corner parser operates much faster than implementations of a bottom-up Earley parser and related chart-based parsers. Moreover, the space requirements are far more modest too. The difference with a left-corner parser, which was derived from the head-corner parser, is small.
We performed similar experiments for the Alvey NL Tools grammar of English , and the English grammar of the MiMo2 system . From these experiments it can be concluded that selective memoization with goal-weakening (as applied to head-corner and left-corner parsing) is substantially more efficient than conventional chart-parsing. We conclude that at least for some grammars, head-corner parsing is a good option.