The results for word accuracy given above provide a measure for the extent to which linguistic processing contributes to speech recognition. However, since the main task of the linguistic component is to analyze utterances semantically, an equally important measure is concept accuracy, i.e. the extent to which semantic analysis corresponds with the meaning of the utterance that was actually produced by the user.
For determining concept accuracy, we have used a semantically annotated corpus of 10K user responses. Each user response was annotated with an update representing the meaning of the utterance that was actually spoken. The annotations were made by our project partners in Amsterdam, in accordance with the existing guidelines .
Updates take the form described in section 2.5. An update
is a logical formula which can be evaluated against an information
state and which gives rise to a new, updated information state. The
most straightforward method for evaluating concept accuracy in this
setting is to compare (the normal form of) the update produced by the
grammar with (the normal form of) the annotated update. A major
obstacle for this approach, however, is the fact that very
fine-grained semantic distinctions can be made in the update-language.
While these distinctions are relevant semantically (i.e. in certain
cases they may lead to slightly different updates of an information
state), they can often be ignored by a dialogue manager. For instance,
the two updates below are semantically not equivalent, as the ground-focus distinction is slightly
Since semantic analysis is the input for the dialogue manager, we have measured concept accuracy in terms of a simplified version of the update language. Inspired by a similar proposal in Boros et al. , we translate each update into a set of semantic units, where a unit in our case is a triple CommunicativeFunction, Slot, Value. For instance, the two examples above both translate as
denial, destination_town, leiden
correction, destination_town, abcoude
Both the updates in the annotated corpus and the updates produced by the system were translated into semantic units.
Semantic accuracy is given in table 5 according to four different definitions. Firstly, we list the proportion of utterances for which the corresponding semantic units exactly match the semantic units of the annotation ( match). Furthermore we calculate precision (the number of correct semantic units divided by the number of semantic units which were produced) and recall (the number of correct semantic units divided by the number of semantic units of the annotation). Finally, following Boros et al. , we also present concept accuracy as
where SU is the total number of semantic units in the translated corpus annotation, and SUS, SUI, and SUD are the number of substitutions, insertions, and deletions that are necessary to make the translated grammar update equivalent to the translation of the corpus update.
We achieve the results listed in table 5 for the test-set of 1000 word-graphs. String accuracy is presented in terms of word-accuracy (WA) and sentence accuracy (SA).
|Input||Method||String accuracy||Semantic accuracy|