next up previous
Next: A sample grammar Up: Beyond concatenation Previous: More powerful string operations


I will restrict the attention to a class of constraint-based formalisms, in which operations on strings are defined that are more powerful than concatenation, but which operations are restricted to be nonerasing, and linear. The resulting class of systems can be characterized as Linear Context-Free Rewriting Systems (LCFRS), augmented with feature-structures (F-LCFRS). For a discussion of the properties of LCFRS without feature-structures, see [31] and [32]. Note though that these properties do not carry over to the current system, because of the augmention with feature structures. The proposals discussed in the previous section can be seen as examples of F-LCFRS.

As in LCFRS, the operations on strings in F-LCFRS can be characterized as follows. First, derived structures will be mapped onto a string of words; i.e. each derived structure `knows' which string it `dominates'. For example, each derived feature structure may contain an attribute `string' whose value is a list of atoms representing the string it dominates. Furthermore, I will identify this string with the set of occurrences of words (or bag of words) in the string. I will write w(F) for the set of occurrences of words that the derived structure F dominates. Rules combine structures D1...Dn into a new structure M. Nonerasure requires that the union of w applied to each daughter is a subset of w(M):

$\displaystyle \bigcup^{n}_{i=1}$w(Di) $\displaystyle \subseteq$ w(M)

Linearity requires that the difference of the cardinalities of these sets is a constant factor; i.e. a rule may only introduce a fixed number of words syncategorematically:

| w(M)| - |$\displaystyle \bigcup^{n}_{i=1}$w(Di)| = c, c a constant

CF-based formalisms clearly fulfill this requirement, as do Head Grammars, grammars using sequence union, Johnson's Australian grammar, and TAG's. Unlike in the definition of LCFRS I do not require that these operations are operations in the strict sense, i.e. the operations do not have to be functional. Note that sequence union is relational; the others are functional. For a discussion of the consequences of this difference, refer to [23]. I assume here furthermore that $ \bigcup^{n}_{i=1}$w(Di) = w(M), for all rules other than lexical entries (i.e. all words are introduced on a terminal). Note though that a simple generalization of the algorithm presented below handles the general case (along the lines of [26,27] by treating rules that introduce extra lexical material as non-chain-rules).

Furthermore, I will assume that each rule has a designated daughter, called the head. Although I will not impose any restrictions on the head, it will turn out that the parsing strategy to be proposed will be very sensitive to the choice of heads, with the effect that F-LCFRS's in which the notion `head' is defined in a systematic way (Pollard's Head Grammars, Reape's version of HPSG, Dowty's version of Categorial Grammar), may be much more efficiently parsed than other grammars. The notion lexical head of a parse tree is defined recursively in terms of the head. The lexical head of a tree will be the lexical head of its head. The lexical head of a terminal will be that terminal itself. I will write the head of a definite clause as the leftmost daughter of that clause.

A grammar will be a set of definite clauses over a constraint language, following [10]. The constraint language consists of path equations in the manner of PATR. Hence, instead of first-order terms we are using feature structures. The relation defined by the grammar will be the relation `sign'. I assume furthermore that no other recursive relations are being defined. Hence a grammar rule will be something like

sign(M):-sign(D1)...sign(Dn),$\displaystyle \phi$.

where $ \phi$ are constraints on the variables. However, each nonunit rule may be associated with one extra relation-call. This extra relation is thought of as defining how the string of the mother node is built from the strings of the daughters. For example, to implement a simple grammar rule using concatenation, we may write

\head{sign(M) {\mbox{\tt :-}}}
\body{\phi .}

where the relation `concatenate_strings' states that the string of the mother is the concatenation of the strings of the daughters. The extra relation should be defined in such a way that given the two instantiated daughter signs, the extra relation is guaranteed to terminate (assuming Prolog like procedural semantics).

Furthermore, each grammar should provide a definition of the predicates head/2 and string/2. The first one defines the relation between the mother sign of a rule and its head; the second one defines for a sign the string (as a list of atoms) associated with that sign.

In the meta-interpreter I use for a nonunit clause

sign(Mother):-sign(Head ), sign(D1)...sign(Dn), Call,$\displaystyle \phi$.

the predicate rule/4:

rule(Head, Mother,$\displaystyle \langle$D1...Dn$\displaystyle \rangle$, Call ):-$\displaystyle \phi$.

Unit clauses are represented as

lex(M):-$\displaystyle \phi$.

I assume the meta interpreter has the predicate `call/1' available which may be used as in Prolog.

next up previous
Next: A sample grammar Up: Beyond concatenation Previous: More powerful string operations
Noord G.J.M. van