contents index next

1. FSA6: Finite State Automata Utilities Version 6

(manual generated with FSA Utilities version fsa6-268)

FSA6 is a collection of utilities to

FSA6 supports a number of different types of automata:

1.1 Functionality

1.1.1 Constructing Finite Automata with Regular Expressions

Many basic regular expression operators are provided, both for acceptors and transducers. Moreover, it is easy to define new regular expression operators. The built-in regular expression operators include:

1.1.2 Manipulating Finite Automata

Tools are provided to manipulate finite automata. Such manipulations include determinization and minimization (both the classical algorithms for recognizers and the recent algorithms for transducers are provided).

1.1.3 Applying Finite Automata

1.1.4 Visualizing Finite Automata

Much attention has been paid to be able to visualize finite state recognizers and finite state transducers. Support includes built- in visualization and interfaces to third party software:

1.2 How to use the toolbox

There are a number of ways that the toolbox is meant to be used:

% fsa -tk
% fsa write=postscript -r '[a,b+,c*,d]' | ghostview -
% sicstus
SICStus ...
Licensed to ...
| ?- use_module(fsa_library).
| ?- fsa_regex_atom_compile('[a*,b^,{d,e}]',L).
L = fa(r(fsa_preds),3,[0],[1],[trans(0,a,0),trans(0,b,2),
    trans(0,d,1),trans(0,e,1),trans(2,d,1),trans(2,e,1)],[]) ?
| ?- fsa_regex_transduces('{a:b,? -a}*',"ababac",L), atom_codes(Atom,L).
L = [98,98,98,98,98,99],
Atom = bbbbbc ?
| ?-

All predicates that are imported have names starting with fsa. All module names start with fsa as well.

1.3 Examples

The package comes with a number of larger examples These examples include both automata and extended regular expression definitions.

Regular expression operators which allows to input an automaton by listing its transitions, start states, and final states. Contributed by Dale Gerdemann.

A collection of regular expression operators including boolean operators and various predicates of automata. Contributed by Dale Gerdemann.

A finite-state automaton of all possible Dutch monosyllabic words. Contributed by Gosse Bouma.

Dutch words, taken somewhere from ftp site (see ftp_info.txt). This list of words can then be used to experiment with the option to create perfect hashes (-dict2ph).

The replace operator as defined in our EACL 99 paper. Also some further examples with longest match and finite-state parsing. Contributed by Gerdemann and van Noord.

Grapheme to Phoneme conversion for Dutch, as described in Bouma's ACL 2000 paper. Uses the Celex format for phonemes. Contributed by Gosse Bouma.

Implementation of the Hopcroft minimization algorithm by Edmund Grimley-Evans, as described in his WIA 97 paper. This code made me re-implement the FSA implementation of Hopcroft's algorithm. Contributed by Grimley-Evans.

HMM's can be seen as a special type of weighted finite automata. This example implements the Baum-Welch training algorithm. Fairly simple-minded implementation.

Code to convert LaTeX hyphenation patterns into a single (large) finite-state transducer that can be used directly to hyphenate a given word. By Gosse Bouma and van Noord, after a suggestion by Lauri Karttunen.

These examples are taken from Kaplan and Kay, Regular Models of Phonological Rule Systems, Computational Linguistics, 20(3), 1994. Simple examples of transducers, and composition.

These examples are taken from Karttunen, Finite-state Constraints, Proceedings International Conference on Current Issues in Computational Linguistics, Universiti Sains Malaysia, Penang, 1991. Simple examples of transducers, and composition.

Lauri Karttunen, The Replace Operator, ACL 1995, MIT Boston. Fairly complex examples of regular expression operator definitions.

Lauri Karttunen, Directed Replacement, ACL 1996. Includes soundex example from MLTT home page.

Lauri Karttunen, The Replace Operator, 1997. In volume edited by Roche and Schabes.

Simple examples of weighted automata.

Mehryar Mohri and Richard Sproat, An Efficient Compiler for Weighted Rewrite Rules. 34th Annual Meeting of the ACL, Santa Cruz 1996, pages 230-238. This only treats the non-weighted case. Nice example of the power of the regular expression language: their algorithm only takes a few paragraphs in FSA6.

Nathan Vailette, Logical Specifications of Transducers for NLP. In: FSMNLP 2001, ESSLLI Helsinki. electronically available from Contributed by Nathan Vailette.

These are examples used by Mark-Jan Nederhof while investigating finite-state approximations of context-free grammars. The larger examples were used in my Computational Linguistics paper, The Treatment of Epsilon-moves in Subset Construction, available from Contributed by Nederhof.

examples from material for a course on computational morphology. Simple examples of transducers.

Implementation of Lauri Karttunen, The Proper Treatment of Optimality in Computational Phonology. FSMNLP 1998, Ankara. Includes definition of lenient composition operator and syllibification algorithm. Also includes Gerdemann/van Noord (even more proper?) alternative implementation. Contributed by Gerdemann and van Noord.

Examples of predicate modules; for example using bitvectors to represent sets of symbols, or using types. The bitvector stuff is only available under SICStus.

Fernando C. N. Pereira and Michael D. Riley, Speech Recognition by Composition of Weighted Finite Automata, 1996 (on cmp-lg). Also appears as chapter 15 of the volume edited by Roche and Schabes. Simple examples of weighted composition. Definition of their version of the composition operator (filtering our spurious paths).

Solving the N-queens problem with regular expressions, by Dale Gerdemann. Another solution by G. van Noord. Interesting examples of definitions of regular expression operators.

Random automata. Used for the experiments documented in my FSMNLP98 paper, on the treatment of epsilon-moves in subset construction.

Small examples.

Emmanuel Roche and Yves Schabes, Deterministic Part-of-speech Tagging with Finite-state Transducers, Computational Linguistics, 21(2), 1995. Small examples of transducers. Also includes a definition of the local extension operator.

Roche and Schabes, Introduction. In: Roche and Schabes (eds), Finite State Language Processing. MIT Press 1997. Includes implementations of is_functional, unambiguous, is_subsequential, build_bimachine, bitransform. Also has simple utils to apply and visualize bimachines.

Implements a simple spell-checker as the combination of a dictionary and strings within Levenshtein distance d of the words in the dictionary (for some fixed d). Interesting application of the priority union operator of Karttunen (1998).

small stuff, including my attempt to translate Dutch temporal expression into a numerical format (that one is quite large, in fact).

Definitions to implement twolevel rules in the style of Kimmo. Contributed by Rob Malouf, Gosse Bouma, Gertjan van Noord.

Small stuff, weighted automata.

Some small acyclic weighted automata.

1.5 References

1.6 Papers using FSA

1.7 Links

1.8 Copyright

Copyright c 1995 - 2001 by Gertjan van Noord. This program is distributed under Gnu General Public License (cf. the file COPYING in distribution).

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

1.9 Acknowledgements

Helpful suggestions, feedback, etc., from a variety of people, too many to list here. Gosse Bouma and Dale Gerdemann were exceptionally influential.

1.10 Author

Gertjan van Noord,

contents index next