A rule of the form
will be written as `replace(T,Lambda,Rho)`. Rules of the more general
form
will be discussed in §3. The
algorithm consists of nine steps composed as in figure 1.

The names of these steps are mostly derived from [7] and
[12] even though the transductions involved are
not exactly the same. In particular, the steps derived from Mohri &
Sproat (`r`, `f`, `l1` and `l2`) will all be
defined in terms of the finite state calculus as opposed to Mohri &
Sproat's approach of using low-level manipulation of states and
transitions.^{3}

The first step, `non_markers`, was already defined above.
For the second step, we first consider a simple special case. If the
empty string is in the language described by `Right`,
then `r(Right)` should insert an
`rb2` in every string position. The definition of
`r(Right)` is both simpler and more efficient if this is treated
as a special case. To insert a bracket in every possible string
position, we use:

[[[] x rb2,sig]*,[] x rb2]

If the empty string is not in `Right`, then we must use
`intro(rb2)` to introduce the marker `rb2`, followed
by `l_iff_r` to ensure that such markers are immediately
followed by a string in `Right`, or more precisely a string
in `Right` where additional instances of `rb2` are freely
inserted in any position other than the beginning. This expression is
written as:

intro(rb2) o l_iff_r(rb2,xign(non_markers(Right),rb2))

Putting these two pieces together with the conditional yields:

macro(r(R), if([] & R, % If: [] is in R: [[[] x rb2,sig]*,[] x rb2], intro(rb2) % Else: o l_iff_r(rb2,xign(non_markers(R),rb2)))).

The third step, `f(domain(T))` is implemented as:

macro(f(Phi), intro(lb2) o l_iff_r(lb2,[xignx(non_markers(Phi),b2), lb2^,rb2])).

The `lb2` is first introduced and then, using
`l_iff_r`, it is constrained to occur immediately before every
instance of (ignoring complexities) `Phi` followed by an
`rb2`. `Phi` needs to be marked as normal text
using `non_markers` and then `xign_x` is used to allow
freely inserted `lb2` and `rb2` anywhere except at
the beginning and end. The following `lb2``^`

allows an
optional `lb2`, which occurs when the empty string is in
`Phi`.

The fourth step is a guessing component which (ignoring complexities)
looks for sequences of the form `lb2 Phi rb2` and
converts some of these into `lb1 Phi rb1`, where the
`b1` marking indicates that the sequence is a candidate for
replacement. The complication is that `Phi`, as always, must be
converted to `non_markers(Phi)` and instances of `b2`
need to be ignored. Furthermore, between pairs of `lb1`
and `rb1`, instances of `lb2` are deleted. These
`lb2` markers have done their job and are no longer needed.
Putting this all together, the definition is:

macro(left_to_right(Phi), [[xsig*, [lb2 x lb1, (ign(non_markers(Phi),b2) o inverse(intro(lb2)) ), rb2 x rb1] ]*, xsig*]).

The fifth step filters out non-longest matches produced in the
previous step. For example (and simplifying a bit), if `Phi` is
`ab*`, then a string of the form ... rb1 a b lb1 b
... should be ruled out since there is an instance of `Phi`
(ignoring brackets except at the end) where there is an internal
`lb1`. This is implemented as:^{4}

macro(longest_match(Phi), not($$([lb1, (ignx(non_markers(Phi),brack) & $$(rb1) ), % longer match must be rb % followed by an rb ])) % so context is ok o % done with rb2, throw away: inverse(intro(rb2))).

The sixth step performs the transduction described by `T`. This
step is straightforwardly implemented, where the main difficulty is
getting `T` to apply to our specially marked string:

macro(aux_replace(T), {{sig,lb2}, [lb1, inverse(non_markers) o T o non_markers, rb1 x [] ] }*).

The seventh step ensures that `lb1` is preceded by a string
in `Left`:

macro(l1(L), ign(if_s_then_p( ignx([xsig*,non_markers(L)],lb1), [lb1,xsig*]), lb2) o inverse(intro(lb1))).

The eighth step ensures that `lb2` is not preceded by a
string in `Left`. This is implemented similarly to the
previous step:

macro(l2(L), if_s_then_p( ignx(not([xsig*,non_markers(L)]),lb2), [lb2,xsig*]) o inverse(intro(lb2))).

Finally the ninth step, `inverse(non_markers)`, removes the
`0`'s so that the final result in not marked up in any special
way.

1999-04-15