# 1. FSA6: Finite State Automata Utilities Version 6

(manual generated with FSA Utilities version fsa6-268)

FSA6 is a collection of utilities to

• construct finite automata (from regular expressions)

• manipulate finite automata

• visualise finite automata

• apply finite automata

FSA6 supports a number of different types of automata:

• recognizers

• weighted recognizers (aka string-to-weight transducers)

• transducers (aka string-to-string transducers)

• weighted transducers (aka string-to-string-weight transducers)

## 1.1 Functionality

### 1.1.1 Constructing Finite Automata with Regular Expressions

Many basic regular expression operators are provided, both for acceptors and transducers. Moreover, it is easy to define new regular expression operators. The built-in regular expression operators include:

• Concatenation; Kleene star; Kleene plus; Option; Union

• Complement; Difference; Intersection

• Reversal; Containment; Ignore

• Composition; Cross-product; Domain; Range; Identity; Inversion;

• Interval

• `Any' meta-symbol.

• Arbitrary predicates instead of symbols

• operators to construct weighted automata

### 1.1.2 Manipulating Finite Automata

Tools are provided to manipulate finite automata. Such manipulations include determinization and minimization (both the classical algorithms for recognizers and the recent algorithms for transducers are provided).

• Determinization. Currently there are three different implementations of this algorithm, depending on how epsilon transitions (jumps) are treated. There is also an implementation of Mohri's determinization algorithm, both for ordinary (string-to-string) transducers and string-to-weight transducers. The implementation is described in a paper in Computational Linguistics, available from http://www.let.rug.nl/~vannoord/papers/

• Minimization. Three different minimization algorithms are supported. There is also an implementation of Mohri's minimization algorithm, both for ordinary (string-to-string) transducers and string-to-weight transducers.

• Random generation of finite automata, based on the algorithm in Leslie (1995).

• Epsilon-removal.

• Completion and Incompletion: extending a given automaton in order to make the transition table total (typically by adding a sink state and adding transitions to this sink state); and removing transitions leading to sink states.

• Regular manipulations. The various regular expression operators can be applied to automata directly as well.

### 1.1.3 Applying Finite Automata

• Acceptance. Tools to check a given string for acceptance by a recognizer.

• Transduction. Tools to apply a transducer to a given input string.

• Production. Tools to produce strings of a given recognizer, and pairs of strings for a given transducer.

• Code Generation. Tools to compile finite automata into efficient Prolog, C, C++, and JAVA programs which can be used to check whether a given string is in the language defined by the automaton, or to generate the transduction of a given string w.r.t. a given transducer.

### 1.1.4 Visualizing Finite Automata

Much attention has been paid to be able to visualize finite state recognizers and finite state transducers. Support includes built- in visualization and interfaces to third party software:

• DOT. The program is able to produce a representation of a finite automaton compatible with the DOT graph visualisation program. DOT (part of AT&T's graphviz) is a tool that automatically figures out how a graph is best displayed (crossing-edges reduction, etc). It can produce e.g. Postscript output. An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/dot.png.

• VCG. The program is able to produce a representation of a finite state automaton compatible with the VCG graph visualisation program. VCG is a tool that automatically figures outhow a graph is best displayed (crossing-edges reduction, etc). An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/vcg.png. The VCG program is now also known as aiSee.

• daVinci. The program is able to produce a representation of a finite state automaton compatible with the daVinci 1.4 graph visualisation program. This program automatically computes the most optimal way to view the finite-state automaton by minimizing the number of crossing edges. Postscript output can easily be generated from the result. An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/daVinci.png.

• TK Widget. The package contains an interface to a TK Widget to browse finite state automata, providing a graphical user-interface for the toolbox. The TK Widget is explained in much more detail below. An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/dump.png. Note that the GUI is not an integral part of the toolbox; it makes perfect sense to use the program in batch mode. The graphical user interface is only available under SICStus.

• LaTeX (if you want to be able to use the result you need the pstricks package). An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/pstricks.png.

• Postscript (thanks to Peter Kleiweg). An example is http://www.let.rug.nl/~vannoord/Fsa/Manual/pk.png.

## 1.2 How to use the toolbox

There are a number of ways that the toolbox is meant to be used:

• interactively using a command interpreter and/or a graphical user interface. For example, in order to use fsa interactively with the graphical user interface, use the command:

% fsa -tk
• as a UNIX-like filter. In such cases you use the fsa command with a number of options. For instance:

% fsa write=postscript -r '[a,b+,c*,d]' | ghostview -
• as a library in your own Prolog program. You can incorporate the FSA program in your own program, just as you can use other Prolog libraries. In order for this to work, you simply need to load the file fsa_library.pl in the installation directory. For example (if you use SWI Prolog, you need 'consult' instead of 'use_module'):

% sicstus
SICStus ...
| ?- use_module(fsa_library).
...
...
...
yes
| ?- fsa_regex_atom_compile('[a*,b^,{d,e}]',L).
L = fa(r(fsa_preds),3,[0],[1],[trans(0,a,0),trans(0,b,2),
trans(0,d,1),trans(0,e,1),trans(2,d,1),trans(2,e,1)],[]) ?
yes
| ?- fsa_regex_transduces('{a:b,? -a}*',"ababac",L), atom_codes(Atom,L).
L = [98,98,98,98,98,99],
Atom = bbbbbc ?
yes
| ?-

All predicates that are imported have names starting with fsa. All module names start with fsa as well.

## 1.3 Examples

The package comes with a number of larger examples These examples include both automata and extended regular expression definitions.

• Examples/Automata

Regular expression operators which allows to input an automaton by listing its transitions, start states, and final states. Contributed by Dale Gerdemann.

• Examples/Booleans

A collection of regular expression operators including boolean operators and various predicates of automata. Contributed by Dale Gerdemann.

• Examples/Bouma

A finite-state automaton of all possible Dutch monosyllabic words. Contributed by Gosse Bouma.

• Examples/DutchWords

Dutch words, taken somewhere from ftp site (see ftp_info.txt). This list of words can then be used to experiment with the option to create perfect hashes (-dict2ph).

• Examples/GerdemannVannoord99

The replace operator as defined in our EACL 99 paper. Also some further examples with longest match and finite-state parsing. Contributed by Gerdemann and van Noord.

• Examples/Graph2Phon

Grapheme to Phoneme conversion for Dutch, as described in Bouma's ACL 2000 paper. Uses the Celex format for phonemes. Contributed by Gosse Bouma.

• Examples/Grimley-Evans

Implementation of the Hopcroft minimization algorithm by Edmund Grimley-Evans, as described in his WIA 97 paper. This code made me re-implement the FSA implementation of Hopcroft's algorithm. Contributed by Grimley-Evans.

• Examples/HMM

HMM's can be seen as a special type of weighted finite automata. This example implements the Baum-Welch training algorithm. Fairly simple-minded implementation.

• Examples/Hyphenation

Code to convert LaTeX hyphenation patterns into a single (large) finite-state transducer that can be used directly to hyphenate a given word. By Gosse Bouma and van Noord, after a suggestion by Lauri Karttunen.

• Examples/KaplanKay94

These examples are taken from Kaplan and Kay, Regular Models of Phonological Rule Systems, Computational Linguistics, 20(3), 1994. Simple examples of transducers, and composition.

• Examples/Karttunen91

These examples are taken from Karttunen, Finite-state Constraints, Proceedings International Conference on Current Issues in Computational Linguistics, Universiti Sains Malaysia, Penang, 1991. Simple examples of transducers, and composition.

• Examples/Karttunen95

Lauri Karttunen, The Replace Operator, ACL 1995, MIT Boston. Fairly complex examples of regular expression operator definitions.

• Examples/Karttunen96

Lauri Karttunen, Directed Replacement, ACL 1996. Includes soundex example from MLTT home page.

• Examples/Karttunen97

Lauri Karttunen, The Replace Operator, 1997. In volume edited by Roche and Schabes.

• Examples/Mohri97

Simple examples of weighted automata.

• Examples/MohriSproat96

Mehryar Mohri and Richard Sproat, An Efficient Compiler for Weighted Rewrite Rules. 34th Annual Meeting of the ACL, Santa Cruz 1996, pages 230-238. This only treats the non-weighted case. Nice example of the power of the regular expression language: their algorithm only takes a few paragraphs in FSA6.

• Examples/MSOL

Nathan Vailette, Logical Specifications of Transducers for NLP. In: FSMNLP 2001, ESSLLI Helsinki. electronically available from http://www.let.rug.nl/~vannoord/alp/esslli_fsmnlp Contributed by Nathan Vailette.

• Examples/Nederhof

These are examples used by Mark-Jan Nederhof while investigating finite-state approximations of context-free grammars. The larger examples were used in my Computational Linguistics paper, The Treatment of Epsilon-moves in Subset Construction, available from http://www.let.rug.nl/~vannoord/papers/ Contributed by Nederhof.

• Examples/Nerbonne

examples from http://www.let.rug.nl/~nerbonne/teach.html material for a course on computational morphology. Simple examples of transducers.

• Examples/OptimalityTheory

Implementation of Lauri Karttunen, The Proper Treatment of Optimality in Computational Phonology. FSMNLP 1998, Ankara. Includes definition of lenient composition operator and syllibification algorithm. Also includes Gerdemann/van Noord (even more proper?) alternative implementation. Contributed by Gerdemann and van Noord.

• Examples/PredModules

Examples of predicate modules; for example using bitvectors to represent sets of symbols, or using types. The bitvector stuff is only available under SICStus.

• Examples/PereiraRiley96

Fernando C. N. Pereira and Michael D. Riley, Speech Recognition by Composition of Weighted Finite Automata, 1996 (on cmp-lg). Also appears as chapter 15 of the volume edited by Roche and Schabes. Simple examples of weighted composition. Definition of their version of the composition operator (filtering our spurious paths).

• Examples/Queens

Solving the N-queens problem with regular expressions, by Dale Gerdemann. Another solution by G. van Noord. Interesting examples of definitions of regular expression operators.

• Examples/Random

Random automata. Used for the experiments documented in my FSMNLP98 paper, on the treatment of epsilon-moves in subset construction.

• Examples/Recognizers

Small examples.

• Examples/RocheSchabes95

Emmanuel Roche and Yves Schabes, Deterministic Part-of-speech Tagging with Finite-state Transducers, Computational Linguistics, 21(2), 1995. Small examples of transducers. Also includes a definition of the local extension operator.

• Examples/RocheSchabes97

Roche and Schabes, Introduction. In: Roche and Schabes (eds), Finite State Language Processing. MIT Press 1997. Includes implementations of is_functional, unambiguous, is_subsequential, build_bimachine, bitransform. Also has simple utils to apply and visualize bimachines.

• Examples/Spell

Implements a simple spell-checker as the combination of a dictionary and strings within Levenshtein distance d of the words in the dictionary (for some fixed d). Interesting application of the priority union operator of Karttunen (1998).

• Examples/Transducers

small stuff, including my attempt to translate Dutch temporal expression into a numerical format (that one is quite large, in fact).

• Examples/twolevel

Definitions to implement twolevel rules in the style of Kimmo. Contributed by Rob Malouf, Gosse Bouma, Gertjan van Noord.

• Examples/Weights

Small stuff, weighted automata.

• Examples/Wordgraphs

Some small acyclic weighted automata.

## 1.5 References

• Alfred V. Aho and John E. Hopcroft and Jeffrey D. Ullman, The Design and Analysis of Computer Algorithms. Addison Wesley, 1974.

• Meera Blattner and Tom Head, Single-Valued a-Transducers. In: Journal of Computer and System Sciences, 15. Pp. 310--327. 1977.

• J.A. Brzozowski. Canonical Regular Expressions and Minimal State Graphs for Definite Events. In: Mathematical Theory of Automata, Polytechnic Press Brooklyn, 1962.

• Gosse Bouma, A Modern CL course using Dutch. In: EACL 99 Postconference Workshop ``Computer and Internet supported Education in Language and Speech Technology'', June 12 1999, Bergen Norway.

• Jan Daciuk, Incremental Construction of Finite-State Automata and Transducers and their use in the Natural Language Processing. Thesis, Politechnika Gdanska, 1998.

• Jeffrey Friedl. Mastering Regular Expressions. O'Reilly 1997.

• Dale Gerdemann and Gertjan van Noord. Transducers from Rewrite Rules with Backreferences. EACL 1999 Bergen.

• John E. Hopcroft. An n log n algorithm for minimizing the states in a finite automaton. In: Z. Kohavi (editor), The Theory of Machines and Computations. Academic Press 1971.

• John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison Wesley 1979.

• C. Douglas Johnson. Formal Aspects of Phonological Descriptions. Mouton The Hague, 1972.

• Ronald M. Kaplan and Martin Kay. Regular Models of Phonological Rule Systems. Computational Linguistics 20 (3) 1994.

• Lauri Karttunen. Finite-state Constraints. In: Proceedings International Conference on Current Issues in Computational Linguistics. Unviersiti Sains Malaysia, Penang. 1991.

• Lauri Karttunen. The Replace Operator. 33rd ACL, Boston, 1995.

• Lauri Karttunen and Jean-Pierre Chanod and Gregory Grefenstette and Anne Schiller, Regular Expressions for Language Engineering. Natural Language Engineering 2 (4) 1996.

• Lauri Karttunen, Directed Replacement. ACL 1996 Santa Cruz.

• Lauri Karttunen, The Proper Treatment of Optimality Theory in Computational Phonology. In: FSMNLP 1998, Ankara.

• George Anton Kiraz and Edmund Grimley-Evans, Multi-Tape Automata for Speech and Language Systems: A Prolog Implementation. In D. Wood and S. Yu (eds.), Automata Implementation, Lecture Notes in Computer Science 1436, Springer, 1998.

• Ted Leslie, Efficient Approaches to Subset Construction. University of Waterloo 1995.

• Mehryar Mohri, Compact Representations by Finite-state Transducers. ACL New Mexico 1994.

• Mehryar Mohri, Finite-State Transducers in Language and Speech Processing, Computational Linguistics 23 (2) 1997.

• Mehryar Mohri and Fernando C.N. Pereira and Michael Riley. A rational design for a weighted finite-state transducer library. In: Automata Implementation. WIA '97. Lecture Notes in Computer Science 1436. Spring Verlag 1998.

• Mehryar Mohri and Richard Sproat. An Efficient Compiler for Weighted Rewrite Rules. ACL 1996 Santa Cruz.

• Gertjan van Noord. FSA Utilities: A Toolbox to Manipulate Finite-state Automata. In: Raymond, Woord, Yu (editors), Automata Implementation. Lecture Notes in Computer Science 1260. Spring Verlag 1997.

• Gertjan van Noord. The Treatment of Epsilon Moves in Subset Construction. In: Computational Linguistics 26 (1) 2000.

• Gertjan van Noord and Dale Gerdemann. An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing. WIA 1999.

• Gertjan van Norod and Dale Gerdemann. Finite State Transducers with Predicates and Identity. Grammars 4 (3) 2001.

• Emmanuel Roche and Yves Schabes. Deterministic Part-of-speech Tagging with Finite-state Transducers. Computational Linguistics 21 (2) 1995.

• Emmanuel Roche and Yves Schabes. Finite-State Language Processing. MIT Press 1997.

• Bruce Watson, Taxonomies and Toolkits of Regular Language Algorithms. Ph.D. thesis TU Eindhoven. 1996.

• Bruce Watson, Implementing and Using Finite Automata Toolkits. In: Proceedings of the ECAI'96 Workshop Extended Finite State Models of Language. 1996.

## 1.6 Papers using FSA

• Gosse Bouma, A Modern CL course using Dutch. In: EACL 99 Postconference Workshop ``Computer and Internet supported Education in Language and Speech Technology'', June 12 1999, Bergen Norway.

• Gosse Bouma, A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics, 303-310 Seattle 2000.

• Gosse Bouma, A modern Computational Linguistic Course using Dutch. In: Frank van Eynde and Ineke Schuurman, editors, CLIN 1998, Papers from the ninth CLIN Meeting, Amsterdam, 1999. Rodopi Press.

• Gosse Bouma, Finite State Methods for Hyphenation. In: FSMNLP 2001, ESSLLI Helsinki.

• Dale Gerdemann and Gertjan van Noord. Transducers from Rewrite Rules with Backreferences. EACL 1999 Bergen.

• Dale Gerdemann and Gertjan van Noord. Approximation and Exactness in Finite State Optimality Theory. In: FINITE-STATE PHONOLOGY: SIGPHON 2000, Fifth Meeting of the ACL Special Interest Group in Computational Phonology, COLING 2000.

• George Anton Kiraz, Multi-Tiered Nonlinear Morphology Using Multi-Tape Finite Automata: A case study on Syriac and Arabic. In: Computational Linguistics, 26 (1) 2000.

• Edwin Kuipers, Ill-formed Input and Finite-state Devices. MA Thesis, Rijksuniversiteit Groningen, 1997.

• Gertjan van Noord, FSA Utilities: Manipulation of Finite-state Automata implemented in Prolog, in: Proceedings of the First International Workshop on Implementing Automata. London Ontario Canada, 1996.

• Gertjan van Noord. FSA Utilities: A Toolbox to Manipulate Finite-state Automata. In: Raymond, Woord, Yu (editors), Automata Implementation. Lecture Notes in Computer Science 1260. Spring Verlag 1997.

• Gertjan van Noord. The Treatment of Epsilon Moves in Subset Construction. In: Computational Linguistics 26 (1) 2000.

• Gertjan van Noord and Dale Gerdemann. An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing. WIA 1999.

• Gertjan van Norod and Dale Gerdemann. Finite State Transducers with Predicates and Identity. Grammars X (X) 2001.

• Nathan Vailette, Logical Specifications of Transducers for NLP. In: FSMNLP 2001, ESSLLI Helsinki.

• Markus Walther, Finite-State Reduplication in One-Level Prosodic Morphology. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics, 296-302 Seattle 2000.

• Markus Walther, One-Level Prosodic Morphology. Marburger Arbeiten zur Linguistic 1, University of Marburg. 64 pp.

• Markus Walther, Temiar Reduplication in One-Level Prosodic Morphology. In: FINITE-STATE PHONOLOGY: SIGPHON 2000, Fifth Meeting of the ACL Special Interest Group in Computational Phonology, COLING 2000.

Copyright c 1995 - 2001 by Gertjan van Noord. This program is distributed under Gnu General Public License (cf. the file COPYING in distribution).

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

## 1.9 Acknowledgements

Helpful suggestions, feedback, etc., from a variety of people, too many to list here. Gosse Bouma and Dale Gerdemann were exceptionally influential.

## 1.10 Author

Gertjan van Noord, mailto:vannoord@let.rug.nl