Converting an automaton to a regular expression

Kapitoly: Regular expressions, Converting a regular expression to an automaton, Generalized NKA, Converting an automaton to a regular expression

We will demonstrate how to convert an arbitrary finite automaton into a regular expression.

Algorithm description

On the input, we have an NFA A. The first thing we do is to convert it to a GNFA. In the second phase, we remove one state at a time in a single step, adjust the transition functions appropriately, and repeat the process until only two states remain: the initial and the final states. An edge connecting these states will bear a regular expression as its label, which represents the result of the entire algorithm. Thus, the entire procedure primarily involves removing a state and subsequently modifying the automaton to obtain an equivalent automaton.

Removing one state

We can illustrate how to remove a single state with a simple example. Suppose a part of our automaton looks like this:

where Ri are some regular expressions. What would the automaton look like if we removed the state qr? We can imagine being in the state qi and asking, what are all the words we can generate in this section? If we go from the state qi directly to the state qj, it will include all the words that match the regular expression R4.

But if we go to the state qr, we can generate words of the form R1. In the state qr we can cycle for the regular expression R2, allowing us to generate words of the form $R_1\circ(R_2^\ast)$. Since we can still transition from the state qr to the state qj, we can add the regular expression R3. Overall, this enables us to generate words of the form $R_1\circ(R_2^\ast)\circ R_3$.

Now we know that this part of the automaton can produce words either in the form R4 or in the form $R_1\circ(R_2^\ast)\circ R_3$. Of course, we can express this as a regular expression like $(R_4)|(R_1\circ(R_2^\ast)\circ R_3)$.

Now we can simply remove the state qr and leave only the two states qi and qj, writing the regular expression we just computed in place of R4:

We do this with every edge from every state qi to any state qj, including loops, i.e., the case where qi = qj.

After removing one state, we obtain an equivalent automaton—an automaton that recognizes the same language.

The whole algorithm

As input, we have an NFA $A=\left<Q, \Sigma, \delta, q_0, F\right>$.

  1. We convert the NFA A to a GNFA $Z=\left<Q^\prime, \Sigma, \delta^\prime, q_0^\prime, q_f^\prime\right>$. Next, we will use the letter k to denote the number of states in Z.
  2. If k = 2, the algorithm terminates, and the edge between the initial and final states is the resulting regular expression.
  3. If k>2, we select any state qr that is different from the initial and final states, i.e., $q_r\ne q_0^\prime$ and $q_r\ne q_f^\prime$. Next, we create a new GNFA $Z^\prime=\left<Q^{\prime\prime}, \Sigma, \delta^{\prime\prime},q_0^{\prime},q_f^{\prime}\right>$, which will be valid: $$ Q^{\prime\prime}=Q^\prime\setminus\left\{q_r\right\} $$ and for all $q_i\in Q^{\prime\prime}\setminus\left\{q_f^\prime\right\}$ and for all $q_j\in Q^{\prime\prime}\setminus \left\{q_0^\prime\right\}$ let $$ \delta^{\prime\prime}(q_i, q_j)=(R_4)|(R_1(R_2^\ast)R_3), $$ where $R_1=\delta^\prime(q_i,q_r)$, $R_2=\delta^\prime(q_r,q_r)$, $R_3=\delta^\prime(q_r, q_j)$, $R_4=\delta^\prime(q_i, q_j)$. Continue with step 2.

Example

Consider the following finite automaton as input:

First, we convert it to GNFA (we will not show unnecessary -transitions):

Now we apply the second part of the algorithm and remove a node. We start with the node q2. We remove node q2 and add a transition from state q1 to state qf. We will label this transition as $b(a|b)^\ast$ because from node q1, we reach the state q2 for words of the form b. Then we can cycle with a|b, leading to $(a|b)^\ast$, and finally, using the epsilon rule, we move to qf. In doing so, we derive $b(a|b)^\ast\epsilon=b(a|b)^\ast$. We thus obtain an automaton:

We are left with removing the last state, q1. We add an edge from the state qs to the state qf. How do we mark it? Since we can reach the q1 state via the epsilon rule, we can simply omit that. Then, the cycle for a gives us the regular expression $a^\ast$. Finally, we reach qf via the edge, so we concatenate the expression with $b(a|b)^\ast$. Thus, we describe the edge with the regular expression $a^\ast b(a|b)^\ast$.

The automaton has only two more states, so the algorithm ends. The edge represents the resulting regular expression.

Resources