Converting an automaton to a regular expression

Kapitoly: Regular expressions, Converting a regular expression to an automaton, Generalized NKA, Converting an automaton to a regular expression

We will demonstrate how to convert an arbitrary finite automaton into a regular expression.

Algorithm description

On the input, we have an NFA A. The first thing we do is to convert it to a GNFA. In the second phase, we remove one state at a time in a single step, adjust the transition functions appropriately, and repeat the process until only two states remain: the initial and the final states. An edge connecting these states will bear a regular expression as its label, which represents the result of the entire algorithm. Thus, the entire procedure primarily involves removing a state and subsequently modifying the automaton to obtain an equivalent automaton.

Removing one state

We can illustrate how to remove a single state with a simple example. Suppose a part of our automaton looks like this:

where R_i are some regular expressions. What would the automaton look like if we removed the state q_r? We can imagine being in the state q_i and asking, what are all the words we can generate in this section? If we go from the state q_i directly to the state q_j, it will include all the words that match the regular expression R₄.

But if we go to the state q_r, we can generate words of the form R₁. In the state q_r we can cycle for the regular expression R₂, allowing us to generate words of the form $R_1\circ(R_2^\ast)$. Since we can still transition from the state q_r to the state q_j, we can add the regular expression R₃. Overall, this enables us to generate words of the form $R_1\circ(R_2^\ast)\circ R_3$.

Now we know that this part of the automaton can produce words either in the form R₄ or in the form $R_1\circ(R_2^\ast)\circ R_3$. Of course, we can express this as a regular expression like $(R_4)|(R_1\circ(R_2^\ast)\circ R_3)$.

Now we can simply remove the state q_r and leave only the two states q_i and q_j, writing the regular expression we just computed in place of R₄:

We do this with every edge from every state q_i to any state q_j, including loops, i.e., the case where q_i = q_j.

After removing one state, we obtain an equivalent automaton—an automaton that recognizes the same language.

The whole algorithm

As input, we have an NFA $A=\left<Q, \Sigma, \delta, q_0, F\right>$.

We convert the NFA A to a GNFA $Z=\left<Q^\prime, \Sigma, \delta^\prime, q_0^\prime, q_f^\prime\right>$. Next, we will use the letter k to denote the number of states in Z.
If k = 2, the algorithm terminates, and the edge between the initial and final states is the resulting regular expression.
If k>2, we select any state q_r that is different from the initial and final states, i.e., $q_r\ne q_0^\prime$ and $q_r\ne q_f^\prime$. Next, we create a new GNFA $Z^\prime=\left<Q^{\prime\prime}, \Sigma, \delta^{\prime\prime},q_0^{\prime},q_f^{\prime}\right>$, which will be valid: $$ Q^{\prime\prime}=Q^\prime\setminus\left\{q_r\right\} $$ and for all $q_i\in Q^{\prime\prime}\setminus\left\{q_f^\prime\right\}$ and for all $q_j\in Q^{\prime\prime}\setminus \left\{q_0^\prime\right\}$ let $$ \delta^{\prime\prime}(q_i, q_j)=(R_4)|(R_1(R_2^\ast)R_3), $$ where $R_1=\delta^\prime(q_i,q_r)$, $R_2=\delta^\prime(q_r,q_r)$, $R_3=\delta^\prime(q_r, q_j)$, $R_4=\delta^\prime(q_i, q_j)$. Continue with step 2.

Example

Consider the following finite automaton as input:

First, we convert it to GNFA (we will not show unnecessary ∅-transitions):

Now we apply the second part of the algorithm and remove a node. We start with the node q₂. We remove node q₂ and add a transition from state q₁ to state q_f. We will label this transition as $b(a|b)^\ast$ because from node q₁, we reach the state q₂ for words of the form b. Then we can cycle with a|b, leading to $(a|b)^\ast$, and finally, using the epsilon rule, we move to q_f. In doing so, we derive $b(a|b)^\ast\epsilon=b(a|b)^\ast$. We thus obtain an automaton:

We are left with removing the last state, q₁. We add an edge from the state q_s to the state q_f. How do we mark it? Since we can reach the q₁ state via the epsilon rule, we can simply omit that. Then, the cycle for a gives us the regular expression $a^\ast$. Finally, we reach q_f via the edge, so we concatenate the expression with $b(a|b)^\ast$. Thus, we describe the edge with the regular expression $a^\ast b(a|b)^\ast$.

The automaton has only two more states, so the algorithm ends. The edge represents the resulting regular expression.

Resources

The example and description of the algorithm are from M. Sipser: Introduction to the Theory of Computation

« Previous: Generalized NKA

Next: Estimating unique values »