In addition to the graphical user interface and the FSA command interpreter, it is also possible to use the toolkit as a library to your program. You can incorporate the FSA program in your own program, just as you can use other Prolog libraries. In order for this to work, you simply need to load the file fsa_library.pl in the installation directory. For example:

% sicstus

SICStus ...

Licensed to ...

| ?- use_module(fsa_library).

...

...

...

yes

| ?- fsa_regex_atom_compile('[a*,b^,{d,e}]',L).

L = fa(r(fsa_preds),3,[0],[1],[trans(0,a,0),trans(0,b,2),

trans(0,d,1),trans(0,e,1),trans(2,d,1),trans(2,e,1)],[]) ?

yes

| ?- fsa_regex_transduces('{a:b,? -a}*',"ababac",L), atom_codes(Atom,L).

L = [98,98,98,98,98,99],

Atom = bbbbbc ?

yes

| ?-

Most predicates that are imported have names starting with **fsa**. All
module names start with **fsa** as well. Below, we list the predicates
exported by the FSA library. The modules which efficiently implement
datastructures such as arrays, hashes, maps and sets are documented
separately; these modules are supposed to be useful independently from
their use in the FSA toolkit.

File is assumed to contain auxiliary regular expression operators. It is loaded in module fsa_regex_aux and will be used for compiling regular expressions.

Note that your file with definitions of regular expression operators is compiled with the special Prolog-syntax operators for regular expression notation loaded. Thus you can use * .. & etc. in your definitions. Drawback is that you cannot use operator notation for e.g. the is/2 predicate.

A typical auxiliary definition will be:

macro(vowel,{a,e,i,o,u}).

A slightly more interesting one:

macro(free(Expr), ~ $ Expr).

You can also explicitly construct an automaton yourself, e.g.:

rx(my_operator(Expr),Fa) :-

fsa_regex:rx(Expr,Fa0),

my_operator_definition(Fa0,Fa).

so you can call fsa_regex:rx/2 for further compilations.

File is assumed to contain auxiliary regular expression operators. It is reconsulted in module fsa_regex_aux and will be used for compiling regular expressions. Normally you want to use fsa_load_aux_file instead. Use this predicate if you need to debug your Prolog definitions in File.

RegexAtom is parsed as a regular expression. This expression is compiled to a finite automaton which is written to File.

RegexAtom is parsed as a regular expression. This expression is compiled to a finite automaton Fa.

A regular expression is read-in from standard input. The expression is compiled and the resulting automaton is saved in file File.

A regular expression is read-in from standard input. The expression is compiled into an automaton Fa.

The regular expression Expr (ground Prolog term) is compiled into an automaton. The automaton is saved into File.

Term is a regular expression. It is compiled into the automaton Fa. The first form is typically used for a new regular expression compilation, whereas the second form is used for embedded compilations (called from user definitions). The only difference is that during debugging the depth of recursion is set to zero for the first form.

The automaton in
File0 is copied to File1. Useful to convert between different
formats, by setting the **read** and **write** global variables.

Fa is read from File. If Format is unspecified the value of the global
variable **read** is taken as the input format.

Fa is written to File. If Format is unspecified the value of the global
variable **write** is taken as the input format.

The number of states in Fa is Integer.

States is an ordered list of integers: all states in Fa.

State is a state in Fa.

StartStates is the ordered list of start states of Fa.

StartState is a start states of Fa.

FinalStates is the ordered list of final states of Fa.

FinalState is a final states of Fa.

Trans is the ordered list of transitions of Fa.

In Fa there is a transition from P to Q with symbol(pair) Sym.

Jumps is the ordered list of jumps of Fa.

In Fa there is a jump from P to Q.

Predicate to construct a finite automaton on the basis of lists of start states, final states, transitions and jumps. These lists need not be ordered. It's somewhat more efficient to specify the number of states, if known. It's even more efficient if you also know the symbols data-structure you want for Fa.

Predicate to construct an automaton on the basis of its components, or to query the components of a given automaton. The difference with fsa_construct/7 is that Starts, Finals, Trans and Jumps must be sorted already.

Predicate to construct a finite automaton on the basis of lists of start states, final states, transitions and jumps. These lists need not be ordered. Moreover, state names can be arbitrary Prolog terms. These state names will be renamed to integers. Symbol list is computed on the basis of Trans. Sigma is determined by the current default predicate module (i.e. by flag *pred_module*).. It's more efficient if you also know the symbols data-structure you want for Fa. Some checking on these symbols is performed nevertheless.

This predicate unifies Fa0 and Fa1 except for the part specified by Key. Part must be one of the atoms symbols, states, start_states, final_states, transitions, jumps. Part0 and Part1 are the corresponding parts in Fa0 and Fa1.

Type is the type of the automaton Fa, where type is one of recognizer, w_recognizer, transducer(Sub), w_transducer(Sub). Sub is one of letter or sequence.

In the first variant, a Prolog program is written to standard output for the automaton Fa. In the second variant, the Prolog program is written to FileOut on the basis of the automaton read in from FileIn.

In the first variant, a C program is written to standard output for the automaton Fa. The C program will read lines from standard input and report for each line whether it is a string accepted by Fa. In the second variant, the C program is written to FileOut on the basis of the automaton read in from FileIn.

A C program is written to FileOut for the automaton Fa. The C program will read lines from standard input and report for each line whether it is a string accepted by Fa.

A C++ program is written to FileOut for the recognizer read from FileIn. The C++ program will read lines from standard input and apply the automaton to each line.

A JAVA program is written to FileOut for the recognizer read from FileIn. The JAVA program will read lines from standard input and apply the automaton to each line.

Predicate to set the global variable with name **Key** to **Val**.

Predicate to query the value of the global variable with name **Key**.
If the value is undefined then **Val** is unified to a default value.
These default values are available as the third argument of the
**fsa_global_decl** predicate.

**Key** is a global variable with default value **Default**. Some typical
values are given in the list **Typical**. **Help** is a string explaining the
variable. Val^Goal can be used to check that Val is an appropriate
value for this flag.

**List** will be unified with a keylist of all the global variables
with their associated values. If no argument is given, then this list
is written to standard output

FSA version information is displayed on standard error. Note that the
version information is available through the **fsa_version** global variable.

Atom is an atom indicating the current Prolog system. At the
moment Atom is one of **sicstus**, **yap**, or **swi**.

A string-to-weight transducer Fa will be constructed implementing the perfect hash for the ListOfStrings; i.e. the transducer maps each string to its rank (in alphabetic order), and does not accept any string not listed in ListOfStrings. The transducer is (w_)minimal.

FileIn is assumed to contain a set of strings: each line is a string. A string-to-weight transducer will be written to FileOut implementing the perfect hash for the set of strings read from FileIn: i.e. the transducer maps each string to its rank (in alphabetic order), and does not accept any string not listed in FileIn. The transducer is (w_)minimal.

A minimal recognizer Fa will be constructed recognizing exactly the strings in ListOfStrings

FileIn is assumed to contain a set of strings: each line is a string. A minimal automaton recognizing exactly those strings is written to FileOut

FileIn is assumed to contain a set of strings: each line is a string. A deterministic automaton recognizing exactly those strings is written to FileOut; the automaton has the form of a trie.

Fa is a deterministic automaton recognizing exactly each of the strings in ListOfStrings; the automaton has the form of a trie.

Succeeds if String is accepted by the regular expression in Atom. For example:

fsa_regex_accepts('[{a,b}*,b,a,b,{a,b}*]',"abbbbabababa").

String is a transduction of String0 according to the regular expression in Atom. Example:

fsa_regex_transduces('a:b',"a",L).

L = [98] ?

String is a transduction of String0 according to the regular expression in Atom. Example:

fsa_regex_transduces_w('[a:3,b:1*]',"abbb",L).

L = 6 ?

This predicate can be used both to recognize a given string or to produce a string according to Fa. This is why we use dif/2 below. We prefer functionality over efficiency here; note that the compiler-to-prolog implements the same functionality.

Making this code faster could be done for instance by indexing on source state and symbol. For deterministic automata, use the compiler-to-c for a fast and compact recognizer.

Since input can be non-deterministic we check for epsilon-cycles (the fifth argument of accepts/6 is a list of states visited after last consumption of input). Again, there are cases where it would make more sense to pre-compute efree automata first. But if that's the case you could do it, right?

StringOut is a transduction of StringIn according to transducer Fa.

This predicate employs a meta-notation in cases where loop-checking encounters a cycle. In that case the notation

|N+|

is written into the output string indicating that the previous N symbols could be repeated here as many times as desired. For example, consider the simple regular expression mapping an a to one or more b's:

a:(b+)

If a transduction is request for input string a, then the following outputs occur:

b

bb|1+|

In the second output string the meta-notation indicates that the second b could be repeated multiple times.

Many of the same remarks wrt fsa_accepts/2 apply here: works for non-instantiated input (lists with variable elements work OK, but variable length lists typically don't). Also takes care of identity constraints in the transducer, including delayed identity constraints and too early identity constraints, by using a non-proper implementation of queues which allow dequeue-ing before enqueue-ing! Examples to try, using the fsa_preds predicate module:

t_minimize([a:b,class(a..f)])

t_minimize({[a:b,?,?,?,?,?,b],[a:c,?,?,?,?,?,c]})

I think this is neat.

Weight is the weight assigned to String by Fa.

Similar to fsa_accepts/2 and fsa_transduces/3 above. However, we assume that there are no identity constraints. Loop-checking for [] input, but no meta-notation in output: we simply produce the minimum in such cases. That seems to be appropriate in most applications (?).

String is a Prolog string, and Regex is an atom that will be parsed as a regular expression. The system will match String approximately to that regular expression, returning each of the matches which require a minimal number of substitutions, insertions, deletions, and transpositions. A match is given by a recipe which is a list of Prolog terms as follows:

P:d deletion at position P

P:i(Pred) insertion of symbol for which Pred is true, at P

P:s(Pred) substitution of symbol for which Pred is true, at P

P:t transposition at position P

where P refers to the position in the sequence of symbols extracted from String where the corresponding edit operation takes place.

String is a Prolog string, and Fa is a finite automaton. The system will match String approximately to this Fa, returning each of the matches which require a minimal number of substitutions, insertions, deletions, and transpositions. A match is given by a recipe which is a list of Prolog terms as follows:

P:d deletion at position P

P:i(Pred) insertion of symbol for which Pred is true, at P

P:s(Pred) substitution of symbol for which Pred is true, at P

P:t transposition at position P

where P refers to the position in the sequence of symbols extracted from String where the corresponding edit operation takes place.

String0 and String is a Prolog string, Regex is an atom that will be parsed as a regular expression denoting a string to string transducer. The system will match String0 approximately to the domain of the regular expression, and return each of the transductions of these approximate matches.

String0 and String is a Prolog string, Fa a string to string transducer. The system will match String0 approximately to the domain of Fa, and return each of the transductions of these approximate matches.

String0 is a Prolog string, Regex is an atom that will be parsed as a regular expression denoting a weighted recognizer. The system will match String0 approximately to the domain of the regular expression, and return each of the weights of these approximate matches.

String0 and String is a Prolog string, Regex is an atom that will be parsed as a regular expression denoting a weighted transducer. The system will match String0 approximately to the domain of the regular expression, and return each of the transductions of these approximate matches (a String and a Weight).

String0 is a Prolog string, Fa a weighted recognizer. The system will match String0 approximately to the domain of Fa, and return each of the weights of these approximate matches.

String0 and String is a Prolog string, Fa a weighted transducer. The system will match String0 approximately to the domain of Fa, and return each of the transductions of these approximate matches (a String and a Weight).

Reports on standard output the minimum weight path in the transducer read from InFile.

This is implemented by a generalization of Dijkstra's algorithm to find the minimum weight path in a given transducer. The algorithm for transducers are implemented in a fully general way, i.e., for various types of transducers (cf. the fsa_semiring module).

The implementation is more general than the predicates provided by
e.g. the SICStus libraries for graphs. And **much** more efficient, even
though the agenda is not maintained as a heap (the latter decision
was caused by the fact that it would be hard to implement such
a heap efficiently for the various types of transducers with their
corresponding `minimum' definitions).

Reports on standard output (if Path is not present) or instantiates Path to the minimum weight path in the transducer Fa.

Array will be instantiated to an UpdatableFsaArray (cf. module fsa_arrays) indicating for each state in the transducer Fa the minimum cost from that state to a final state.

In the first variant, a representation accepted by the daVinci graph visualization program is written to File on the basis of the automaton read from File0. In the second form, the representation for Fa is written to standard output.

In the first variant, a representation accepted by the dot / GraphViz graph visualization program is written to File on the basis of the automaton read from File0. In the second form, the representation for Fa is written to standard output.

In the first variant, a representation accepted by the vcg graph visualization program is written to File on the basis of the automaton read from File0. In the second form, the representation for Fa is written to standard output.

A piece of LaTeX code with pstricks macro's which produces a picture of the automaton read from File0 is written to File. This LaTeX code is supposed to be included in a full LaTeX document. The global variable pstricks_style influences the result.

A standalone LaTeX document with pstricks macro's which produces a picture of the automaton read from File0 is written to File. The global variable pstricks_style influences the result.

Postscript code which produces visualization of automaton read from File0 is written to File. The Postscript macro's are due to Peter Kleiweg. The global variable postscript_res can be set to indicate whether output is meant to be displayed on the screen, or printed.

Starts an external visualization program visualizing Fa. Format indicates what program is to be used and must be one of:

vcg

dot_ghostview (dot -Tps | gv )

pstricks_ghostview (latex ; dvips ; gv)

dotty

davinci