next up previous
Next: FSA Utilities Up: Treatment of -Moves in Previous: Treatment of -Moves in

Introduction

In experimenting with finite-state approximation techniques for context-free and more powerful grammatical formalisms (such as the techniques presented in [12], [11], [4]) we have found that the resulting automata often are extremely large. Moreover, the automata contain many $ \epsilon$-moves (jumps). And finally, if such automata are determinised then the resulting automata are often smaller. It turns out that a straightforward implementation of the subset construction determinisation algorithm performs badly for such inputs.

As a motivating example, consider the definite-clause grammar that has been developed for the OVIS2 Spoken Dialogue System. This grammar is described in detail in [14]. After removing the feature constraints of this grammar, and after the removal of the sub-grammar for temporal expressions, this context-free skeleton grammar was input to an implementation of the technique described in [11]. 1The resulting non-deterministic automaton (labelled zovis2 below) contains 89832 states, 80935 $ \epsilon$-moves, and 80400 transitions. The determinised automaton contains only 6541 states, and 60781 transitions. Finally, the minimal automaton contains only 78 states and 526 transitions! Other grammars give rise to similar numbers. Thus, the approximation techniques yield particularly `verbose' automata for relatively simple languages.

The experiments were performed using the FSA Utilities toolkit [13]. At the time, an old version of the toolkit was used, which ran into memory problems for some of these automata. For this reason, the subset construction algorithm has been re-implemented, paying special attention to the treatment of $ \epsilon$-moves. Three variants of the subset construction algorithm are identified which differ in the way $ \epsilon$-moves are treated:

per graph
The most obvious and straightforward approach is sequential in the following sense. Firstly, an equivalent automaton without $ \epsilon$-moves is constructed for the input. In order to do this, the transitive closure of the graph consisting of all $ \epsilon$-moves is computed. Secondly, the resulting automaton is then treated by a subset construction algorithm for $ \epsilon$-free automata.
per state
For each state which occurs in a subset produced during subset construction, compute the states which are reachable using $ \epsilon$-moves. The results of this computation can be memorised, or computed for each state in a preprocessing step. This is the approach mentioned briefly in [7].2
per subset
For each subset Q of states which arises during subset construction, compute Q' $ \supseteq$ Q which extends Q with all states which are reachable from any member of Q using $ \epsilon$-moves. Such an algorithm is described in [1]. We extend this algorithm by memorising the $ \epsilon$-closure computation.

The motivation for this paper is the experience that the first approach turns out to be impractical for automata with very large numbers of $ \epsilon$-moves. An integration of the subset construction algorithm with the computation of $ \epsilon$-reachable states performs much better in practice. The per subset algorithm almost always performs better than the per state approach. However, for automata with a low number of jumps, the per graph algorithm outperforms the others.

In constructing an $ \epsilon$-free automaton the number of transitions increases. Given the fact that the input automaton already is extremely large (compared to the simplicity of the language it defines), this is an undesirable situation. An equivalent $ \epsilon$-free automaton for the example given above results in an automaton with 2353781 transitions. The implementation of per subset is the only variant which succeeds in determinising the input automaton of this example.

In the following section some background information concerning the FSA Utilities tool-box is provided. Section 3 then presents a short statement of the problem (determinise a given finite-state automaton), and a subset construction algorithm which solves this problem in the absence of $ \epsilon$-moves. Section 4 identifies three variants of the subset construction algorithm which take $ \epsilon$-moves into account. Finally, section 5 discusses some experiments in order to compare the three variants both on randomly generated automata and on automata generated by approximation algorithms.


next up previous
Next: FSA Utilities Up: Treatment of -Moves in Previous: Treatment of -Moves in
Noord G.J.M. van
1998-09-24