Prepositional phrases (which are extremely frequent both in general and in the teletext text type) pose at least three major problems. First, it is necessary to determine the semantic role of the PP, and therefore to find the correct reading of the head preposition, which is in general highly polysemous (frequent English prepositions like after, at, in and of are assigned, respectively, 9, 10, 21 and 14 readings in a medium-size dictionary like Longman's ). Second, it is necessary to deal with collocations to explain the ungrammaticality of phrases like at London and in Christmas although London can be combined with locative and Christmas with temporal prepositions. Finally, it is desirable to constrain PP-attachment to reduce structural ambiguity.
The first two problems can be dealt with jointly by explicitly marking all nouns directly for the preposition readings they can combine with by means of a collocational feature. This feature is percolated to the NP node as a result of the Head Feature Principle. The compositionality of the assignment of the semantic role to the collocation is expressed in the PP grammar rule by unification of the feature of the NP with the specification of the preposition, which filters disallowed combinations. The semantic role of the PP is then unified with the semantic role feature which is assigned lexically to the preposition.
This filter requires adding extra information to the nouns in the lexicon. In practice coding effort can be reduced considerably by defining macro's, the names of which are derived from thesauric classes that share distributional properties. For instance, in some languages names of countries all combine with the same preposition.
As an implementation note, it should be remarked that the strategy requires disjunction of values in the formalism, which is impossible in formalisms like PATR. However, if the value sets are finite, as in the case at hand, disjunction of values can be simulated by what is called a `perverse' method in .
The attachment problem can be dealt with similarly. The semantic role of the PP is determined as described. The role of other modifiers, such as AdvPs, is assumed to be assigned lexically. By marking modified constituents such as VPs for the semantic roles they can be modified by, a similar filter as in the PP rule can be applied here.
It turned out that in our corpus linguistic restrictions only play a minor role in the reduction of PP-attachment ambiguities. As an example, it was already referred to in section 1.1.1. that geographic rather than linguistic knowledge is needed to determine the correct analysis of a sequence of locative PPs. We do not know whether this holds for other subject domains as well.