(manual generated with FSA Utilities version fsa6-268)

FSA6 is a collection of utilities to

**construct**finite automata (from regular expressions)**manipulate**finite automata**visualise**finite automata**apply**finite automata

FSA6 supports a number of different types of automata:

recognizers

weighted recognizers (aka string-to-weight transducers)

transducers (aka string-to-string transducers)

weighted transducers (aka string-to-string-weight transducers)

Many basic regular expression operators are provided, both for acceptors and transducers. Moreover, it is easy to define new regular expression operators. The built-in regular expression operators include:

Concatenation; Kleene star; Kleene plus; Option; Union

Complement; Difference; Intersection

Reversal; Containment; Ignore

Composition; Cross-product; Domain; Range; Identity; Inversion;

Interval

`Any' meta-symbol.

Arbitrary predicates instead of symbols

operators to construct weighted automata

Tools are provided to manipulate finite automata. Such manipulations include determinization and minimization (both the classical algorithms for recognizers and the recent algorithms for transducers are provided).

Determinization. Currently there are three different implementations of this algorithm, depending on how epsilon transitions (jumps) are treated. There is also an implementation of Mohri's determinization algorithm, both for ordinary (string-to-string) transducers and string-to-weight transducers. The implementation is described in a paper in Computational Linguistics, available from

`http://www.let.rug.nl/~vannoord/papers/`Minimization. Three different minimization algorithms are supported. There is also an implementation of Mohri's minimization algorithm, both for ordinary (string-to-string) transducers and string-to-weight transducers.

Random generation of finite automata, based on the algorithm in Leslie (1995).

Epsilon-removal.

Completion and Incompletion: extending a given automaton in order to make the transition table total (typically by adding a sink state and adding transitions to this sink state); and removing transitions leading to sink states.

Regular manipulations. The various regular expression operators can be applied to automata directly as well.

Acceptance. Tools to check a given string for acceptance by a recognizer.

Transduction. Tools to apply a transducer to a given input string.

Production. Tools to produce strings of a given recognizer, and pairs of strings for a given transducer.

Code Generation. Tools to compile finite automata into efficient Prolog, C, C++, and JAVA programs which can be used to check whether a given string is in the language defined by the automaton, or to generate the transduction of a given string w.r.t. a given transducer.

Much attention has been paid to be able to visualize finite state recognizers and finite state transducers. Support includes built- in visualization and interfaces to third party software:

DOT. The program is able to produce a representation of a finite automaton compatible with the DOT graph visualisation program. DOT (part of AT&T's graphviz) is a tool that automatically figures out how a graph is best displayed (crossing-edges reduction, etc). It can produce e.g. Postscript output. An example is

`http://www.let.rug.nl/~vannoord/Fsa/Manual/dot.png`.VCG. The program is able to produce a representation of a finite state automaton compatible with the VCG graph visualisation program. VCG is a tool that automatically figures outhow a graph is best displayed (crossing-edges reduction, etc). An example is

`http://www.let.rug.nl/~vannoord/Fsa/Manual/vcg.png`. The VCG program is now also known as aiSee.daVinci. The program is able to produce a representation of a finite state automaton compatible with the daVinci 1.4 graph visualisation program. This program automatically computes the most optimal way to view the finite-state automaton by minimizing the number of crossing edges. Postscript output can easily be generated from the result. An example is

`http://www.let.rug.nl/~vannoord/Fsa/Manual/daVinci.png`.TK Widget. The package contains an interface to a TK Widget to browse finite state automata, providing a graphical user-interface for the toolbox. The TK Widget is explained in much more detail below. An example is

`http://www.let.rug.nl/~vannoord/Fsa/Manual/dump.png`. Note that the GUI is not an integral part of the toolbox; it makes perfect sense to use the program in batch mode. The graphical user interface is only available under SICStus.LaTeX (if you want to be able to use the result you need the pstricks package). An example is

`http://www.let.rug.nl/~vannoord/Fsa/Manual/pstricks.png`.Postscript (thanks to Peter Kleiweg). An example is

`http://www.let.rug.nl/~vannoord/Fsa/Manual/pk.png`.

There are a number of ways that the toolbox is meant to be used:

interactively using a command interpreter and/or a graphical user interface. For example, in order to use fsa interactively with the graphical user interface, use the command:

% fsa -tk

as a UNIX-like filter. In such cases you use the fsa command with a number of options. For instance:

% fsa write=postscript -r '[a,b+,c*,d]' | ghostview -

as a library in your own Prolog program. You can incorporate the FSA program in your own program, just as you can use other Prolog libraries. In order for this to work, you simply need to load the file fsa_library.pl in the installation directory. For example (if you use SWI Prolog, you need 'consult' instead of 'use_module'):

% sicstus

SICStus ...

Licensed to ...

| ?- use_module(fsa_library).

...

...

...

yes

| ?- fsa_regex_atom_compile('[a*,b^,{d,e}]',L).

L = fa(r(fsa_preds),3,[0],[1],[trans(0,a,0),trans(0,b,2),

trans(0,d,1),trans(0,e,1),trans(2,d,1),trans(2,e,1)],[]) ?

yes

| ?- fsa_regex_transduces('{a:b,? -a}*',"ababac",L), atom_codes(Atom,L).

L = [98,98,98,98,98,99],

Atom = bbbbbc ?

yes

| ?-

All predicates that are imported have names starting with **fsa**. All
module names start with **fsa** as well.

The package comes with a number of larger examples These examples include both automata and extended regular expression definitions.

Examples/Automata

Regular expression operators which allows to input an automaton by listing its transitions, start states, and final states. Contributed by Dale Gerdemann.

Examples/Booleans

A collection of regular expression operators including boolean operators and various predicates of automata. Contributed by Dale Gerdemann.

Examples/Bouma

A finite-state automaton of all possible Dutch monosyllabic words. Contributed by Gosse Bouma.

Examples/DutchWords

Dutch words, taken somewhere from ftp site (see ftp_info.txt). This list of words can then be used to experiment with the option to create perfect hashes (-dict2ph).

Examples/GerdemannVannoord99

The replace operator as defined in our EACL 99 paper. Also some further examples with longest match and finite-state parsing. Contributed by Gerdemann and van Noord.

Examples/Graph2Phon

Grapheme to Phoneme conversion for Dutch, as described in Bouma's ACL 2000 paper. Uses the Celex format for phonemes. Contributed by Gosse Bouma.

Examples/Grimley-Evans

Implementation of the Hopcroft minimization algorithm by Edmund Grimley-Evans, as described in his WIA 97 paper. This code made me re-implement the FSA implementation of Hopcroft's algorithm. Contributed by Grimley-Evans.

Examples/HMM

HMM's can be seen as a special type of weighted finite automata. This example implements the Baum-Welch training algorithm. Fairly simple-minded implementation.

Examples/Hyphenation

Code to convert LaTeX hyphenation patterns into a single (large) finite-state transducer that can be used directly to hyphenate a given word. By Gosse Bouma and van Noord, after a suggestion by Lauri Karttunen.

Examples/KaplanKay94

These examples are taken from Kaplan and Kay, Regular Models of Phonological Rule Systems, Computational Linguistics, 20(3), 1994. Simple examples of transducers, and composition.

Examples/Karttunen91

These examples are taken from Karttunen, Finite-state Constraints, Proceedings International Conference on Current Issues in Computational Linguistics, Universiti Sains Malaysia, Penang, 1991. Simple examples of transducers, and composition.

Examples/Karttunen95

Lauri Karttunen, The Replace Operator, ACL 1995, MIT Boston. Fairly complex examples of regular expression operator definitions.

Examples/Karttunen96

Lauri Karttunen, Directed Replacement, ACL 1996. Includes soundex example from MLTT home page.

Examples/Karttunen97

Lauri Karttunen, The Replace Operator, 1997. In volume edited by Roche and Schabes.

Examples/Mohri97

Simple examples of weighted automata.

Examples/MohriSproat96

Mehryar Mohri and Richard Sproat, An Efficient Compiler for Weighted Rewrite Rules. 34th Annual Meeting of the ACL, Santa Cruz 1996, pages 230-238. This only treats the non-weighted case. Nice example of the power of the regular expression language: their algorithm only takes a few paragraphs in FSA6.

Examples/MSOL

Nathan Vailette, Logical Specifications of Transducers for NLP. In:
FSMNLP 2001, ESSLLI Helsinki. electronically available from
`http://www.let.rug.nl/~vannoord/alp/esslli_fsmnlp` Contributed by
Nathan Vailette.

Examples/Nederhof

These are examples used by Mark-Jan Nederhof while
investigating finite-state approximations of context-free
grammars. The larger examples were used in my Computational
Linguistics paper, The Treatment of Epsilon-moves in Subset Construction,
available from `http://www.let.rug.nl/~vannoord/papers/`
Contributed by Nederhof.

Examples/Nerbonne

examples from `http://www.let.rug.nl/~nerbonne/teach.html` material for
a course on computational morphology. Simple examples of transducers.

Examples/OptimalityTheory

Implementation of Lauri Karttunen, The Proper Treatment of Optimality in Computational Phonology. FSMNLP 1998, Ankara. Includes definition of lenient composition operator and syllibification algorithm. Also includes Gerdemann/van Noord (even more proper?) alternative implementation. Contributed by Gerdemann and van Noord.

Examples/PredModules

Examples of predicate modules; for example using bitvectors to represent sets of symbols, or using types. The bitvector stuff is only available under SICStus.

Examples/PereiraRiley96

Fernando C. N. Pereira and Michael D. Riley, Speech Recognition by Composition of Weighted Finite Automata, 1996 (on cmp-lg). Also appears as chapter 15 of the volume edited by Roche and Schabes. Simple examples of weighted composition. Definition of their version of the composition operator (filtering our spurious paths).

Examples/Queens

Solving the N-queens problem with regular expressions, by Dale Gerdemann. Another solution by G. van Noord. Interesting examples of definitions of regular expression operators.

Examples/Random

Random automata. Used for the experiments documented in my FSMNLP98 paper, on the treatment of epsilon-moves in subset construction.

Examples/Recognizers

Small examples.

Examples/RocheSchabes95

Emmanuel Roche and Yves Schabes, Deterministic Part-of-speech Tagging with Finite-state Transducers, Computational Linguistics, 21(2), 1995. Small examples of transducers. Also includes a definition of the local extension operator.

Examples/RocheSchabes97

Roche and Schabes, Introduction. In: Roche and Schabes (eds), Finite State Language Processing. MIT Press 1997. Includes implementations of is_functional, unambiguous, is_subsequential, build_bimachine, bitransform. Also has simple utils to apply and visualize bimachines.

Examples/Spell

Implements a simple spell-checker as the combination of a dictionary and strings within Levenshtein distance d of the words in the dictionary (for some fixed d). Interesting application of the priority union operator of Karttunen (1998).

Examples/Transducers

small stuff, including my attempt to translate Dutch temporal expression into a numerical format (that one is quite large, in fact).

Examples/twolevel

Definitions to implement twolevel rules in the style of Kimmo. Contributed by Rob Malouf, Gosse Bouma, Gertjan van Noord.

Examples/Weights

Small stuff, weighted automata.

Examples/Wordgraphs

Some small acyclic weighted automata.

Alfred V. Aho and John E. Hopcroft and Jeffrey D. Ullman, The Design and Analysis of Computer Algorithms. Addison Wesley, 1974.

Meera Blattner and Tom Head, Single-Valued a-Transducers. In: Journal of Computer and System Sciences, 15. Pp. 310--327. 1977.

J.A. Brzozowski. Canonical Regular Expressions and Minimal State Graphs for Definite Events. In: Mathematical Theory of Automata, Polytechnic Press Brooklyn, 1962.

Gosse Bouma, A Modern CL course using Dutch. In: EACL 99 Postconference Workshop ``Computer and Internet supported Education in Language and Speech Technology'', June 12 1999, Bergen Norway.

Jan Daciuk, Incremental Construction of Finite-State Automata and Transducers and their use in the Natural Language Processing. Thesis, Politechnika Gdanska, 1998.

Jeffrey Friedl. Mastering Regular Expressions. O'Reilly 1997.

Dale Gerdemann and Gertjan van Noord. Transducers from Rewrite Rules with Backreferences. EACL 1999 Bergen.

John E. Hopcroft. An n log n algorithm for minimizing the states in a finite automaton. In: Z. Kohavi (editor), The Theory of Machines and Computations. Academic Press 1971.

John E. Hopcroft and Jeffrey D. Ullman. Introduction to Automata Theory, Languages and Computation. Addison Wesley 1979.

C. Douglas Johnson. Formal Aspects of Phonological Descriptions. Mouton The Hague, 1972.

Ronald M. Kaplan and Martin Kay. Regular Models of Phonological Rule Systems. Computational Linguistics 20 (3) 1994.

Lauri Karttunen. Finite-state Constraints. In: Proceedings International Conference on Current Issues in Computational Linguistics. Unviersiti Sains Malaysia, Penang. 1991.

Lauri Karttunen. The Replace Operator. 33rd ACL, Boston, 1995.

Lauri Karttunen and Jean-Pierre Chanod and Gregory Grefenstette and Anne Schiller, Regular Expressions for Language Engineering. Natural Language Engineering 2 (4) 1996.

Lauri Karttunen, Directed Replacement. ACL 1996 Santa Cruz.

Lauri Karttunen, The Proper Treatment of Optimality Theory in Computational Phonology. In: FSMNLP 1998, Ankara.

George Anton Kiraz and Edmund Grimley-Evans, Multi-Tape Automata for Speech and Language Systems: A Prolog Implementation. In D. Wood and S. Yu (eds.), Automata Implementation, Lecture Notes in Computer Science 1436, Springer, 1998.

Ted Leslie, Efficient Approaches to Subset Construction. University of Waterloo 1995.

Mehryar Mohri, Compact Representations by Finite-state Transducers. ACL New Mexico 1994.

Mehryar Mohri, Finite-State Transducers in Language and Speech Processing, Computational Linguistics 23 (2) 1997.

Mehryar Mohri and Fernando C.N. Pereira and Michael Riley. A rational design for a weighted finite-state transducer library. In: Automata Implementation. WIA '97. Lecture Notes in Computer Science 1436. Spring Verlag 1998.

Mehryar Mohri and Richard Sproat. An Efficient Compiler for Weighted Rewrite Rules. ACL 1996 Santa Cruz.

Gertjan van Noord. FSA Utilities: A Toolbox to Manipulate Finite-state Automata. In: Raymond, Woord, Yu (editors), Automata Implementation. Lecture Notes in Computer Science 1260. Spring Verlag 1997.

Gertjan van Noord. The Treatment of Epsilon Moves in Subset Construction. In: Computational Linguistics 26 (1) 2000.

Gertjan van Noord and Dale Gerdemann. An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing. WIA 1999.

Gertjan van Norod and Dale Gerdemann. Finite State Transducers with Predicates and Identity. Grammars 4 (3) 2001.

Emmanuel Roche and Yves Schabes. Deterministic Part-of-speech Tagging with Finite-state Transducers. Computational Linguistics 21 (2) 1995.

Emmanuel Roche and Yves Schabes. Finite-State Language Processing. MIT Press 1997.

Bruce Watson, Taxonomies and Toolkits of Regular Language Algorithms. Ph.D. thesis TU Eindhoven. 1996.

Bruce Watson, Implementing and Using Finite Automata Toolkits. In: Proceedings of the ECAI'96 Workshop Extended Finite State Models of Language. 1996.

Gosse Bouma, A Modern CL course using Dutch. In: EACL 99 Postconference Workshop ``Computer and Internet supported Education in Language and Speech Technology'', June 12 1999, Bergen Norway.

Gosse Bouma, A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics, 303-310 Seattle 2000.

Gosse Bouma, A modern Computational Linguistic Course using Dutch. In: Frank van Eynde and Ineke Schuurman, editors, CLIN 1998, Papers from the ninth CLIN Meeting, Amsterdam, 1999. Rodopi Press.

Gosse Bouma, Finite State Methods for Hyphenation. In: FSMNLP 2001, ESSLLI Helsinki.

Dale Gerdemann and Gertjan van Noord. Transducers from Rewrite Rules with Backreferences. EACL 1999 Bergen.

Dale Gerdemann and Gertjan van Noord. Approximation and Exactness in Finite State Optimality Theory. In: FINITE-STATE PHONOLOGY: SIGPHON 2000, Fifth Meeting of the ACL Special Interest Group in Computational Phonology, COLING 2000.

George Anton Kiraz, Multi-Tiered Nonlinear Morphology Using Multi-Tape Finite Automata: A case study on Syriac and Arabic. In: Computational Linguistics, 26 (1) 2000.

Edwin Kuipers, Ill-formed Input and Finite-state Devices. MA Thesis, Rijksuniversiteit Groningen, 1997.

Gertjan van Noord, FSA Utilities: Manipulation of Finite-state Automata implemented in Prolog, in: Proceedings of the First International Workshop on Implementing Automata. London Ontario Canada, 1996.

Gertjan van Noord. FSA Utilities: A Toolbox to Manipulate Finite-state Automata. In: Raymond, Woord, Yu (editors), Automata Implementation. Lecture Notes in Computer Science 1260. Spring Verlag 1997.

Gertjan van Noord. The Treatment of Epsilon Moves in Subset Construction. In: Computational Linguistics 26 (1) 2000.

Gertjan van Noord and Dale Gerdemann. An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing. WIA 1999.

Gertjan van Norod and Dale Gerdemann. Finite State Transducers with Predicates and Identity. Grammars X (X) 2001.

Nathan Vailette, Logical Specifications of Transducers for NLP. In: FSMNLP 2001, ESSLLI Helsinki.

Markus Walther, Finite-State Reduplication in One-Level Prosodic Morphology. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics, 296-302 Seattle 2000.

Markus Walther, One-Level Prosodic Morphology. Marburger Arbeiten zur Linguistic 1, University of Marburg. 64 pp.

Markus Walther, Temiar Reduplication in One-Level Prosodic Morphology. In: FINITE-STATE PHONOLOGY: SIGPHON 2000, Fifth Meeting of the ACL Special Interest Group in Computational Phonology, COLING 2000.

Up-to-date information on the program can be obtained via

`http://www.let.rug.nl/~vannoord/Fsa/`. The latest version of the program should be available there too.For information on the daVinci program, we refer to its homepage

`http://www.informatik.uni-bremen.de/~inform/forschung/daVinci/`.For information on dot/GraphViz, we refer to

`http://www.research.att.com:80/sw/tools/graphviz/`.For information on the VCG program:

`http://www.cs.uni-sb.de/RW/users/sander/html/gsvcg1.html`A commercial version of VCG (called aiSee) is available from

`http://www.aisee.com/gallery/`There is lots of interesting material at MLTT Xerox Grenoble:

`http://www.xrce.xerox.com/competencies/content-analysis/fst/`. Be sure to read the documentation, including a number of nice examples.AT&T's FSM toolset for weighted finite-state automata is available from

`http://www.research.att.com/sw/tools/fsm`.The corresponding lextools package by Richard Sproat is now available too, the url is

`http://www.research.att.com/sw/tools/lextools/`Grail. Grail is a package of software for symbolic computation with finite machines and regular expressions. It is freely available to students, educators, and anyone who simply wants to use the software for their own amusement or education. The Grail homepage is at

`http://www.csd.uwo.ca/staff/drraymon/.grail/grail.html`(somewhat outdated). A version for Linux is available from`http://http://www.cs.sun.ac.za/~lynette/merlin.html`.For a web interface to FSA, refer to

`http://www.let.rug.nl/~vannoord/fsademo/`.For a web interface to FSA3, c.f.:

`http://i12www.ira.uka.de/Visualisierung.endlicher.Automaten/`.A tutorial for FSA by Gosse Bouma, in Dutch:

`http://www.let.rug.nl/~gosse/tt/fsa.html`An even simpler tutorial for FSA (in Dutch as well) used for highschool kids (!) is available as

`http://www.let.rug.nl/~vannoord/fsademo/fsademo/klas.html`Electronic versions of some of the papers mentioned above are available through the cmp-lg archive at

`http://xxx.lanl.gov:80/cmp-lg/`.A list of related projects at University of Western Ontario by Darrell Raymond is

`http://www.csd.uwo.ca/staff/drraymon/.grail/links.html`. You can also obtain a copy of Ted Leslie's thesis from that site, which includes the algorithm to generate random automata, and which discusses density of automata related to determinization.Finite state Utilities by Jan Daciuk at

`http://www.pg.gda.pl/~jandac/fsa.html`. Useful tools for dictionary construction and spell checking. Also read his dissertation.SICStus Prolog home page:

`http://www.sics.se/isl/sicstus.html`.Collection of links on Prolog and Regular Expressions:

`http://www.let.rug.nl/~vannoord/prolog-rx/PrologAndRegex.html`Wiese's Little Automata Builder at

`http://www-ti.informatik.uni-tuebingen.de/~wiese/Automaton/`Another Java applet for finite automata at

`http://www.cs.duke.edu/~rodger/tools/jflap/index.html`Interesting papers on Gene Myers' Home Page

`http://www.cs.arizona.edu/people/gene`Application for interactive drawing of finite automata

`http://triskam.virtualave.net/finomaton/finomaton.html`

Copyright c 1995 - 2001 by Gertjan van Noord. This program is distributed under Gnu General Public License (cf. the file COPYING in distribution).

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Helpful suggestions, feedback, etc., from a variety of people, too many to list here. Gosse Bouma and Dale Gerdemann were exceptionally influential.

Gertjan van Noord,
`mailto:vannoord@let.rug.nl`