Exercise 15 | RACSO

Exercise _‹15:

Judging exercises on CFGs

The coordinator of the «Theory of Computation» subject in our university wants to create a website that automatically evaluates exercises on context-free grammars (CFGs). The idea is that, for each exercise, the teacher simply prepares a reference solution CFG

G_1

, and then, any CFG

G_2

submitted by a student is automatically compared against

G_1

to determine whether they are equivalent (i.e., whether

G_1

and

G_2

generate the same language). Unfortunately, it is not possible to create such an evaluator, as equivalence of CFGs is an undecidable problem. Nevertheless, it is possible to implement a less ambitious automatic evaluator as follows: for each length

\ell

(up to a certain maximum), test whether there exists a word

w

with length

\ell

that is generated by one of

G_1,G_2

but not by the other. Clearly, when such word

w

exists, then

w

is a counterexample to the equivalence of

G_1

and

G_2

. When no such

w

exists, either

G_1

and

G_2

are indeed equivalent, or we did not test with an

\ell

big enough to find a counterexample. There are many possible ways to implement an algorithm that, given

G_1,G_2

and

\ell

, looks for such counterexample

w

; here we consider one possible technique.

Solve the following exercise by means of a reduction to SAT:

Given a natural $\ell>0$ and two CFGs $G_1,G_2$ over an alphabet $\Sigma$ , determine whether there exists a word $w\in\Sigma^*$ with length $\ell$ that is generated by one of the grammars but not by the other. We call such $w$ the counterexample to the equivalence of $G_1$ and $G_2$ .
To simplify the setting, we assume that the grammars are normalized in the following sense: all non-terminal symbols are useful (i.e., they generate at least one word, and they can be reached from the start symbol of the grammar), and all the production rules are either of the form $X\to YZ$ or $X\to a$ , where $X,Y,Z$ are non-terminal symbols, and $a$ is a terminal symbol in $\Sigma$ . Note that a CFG normalized like that cannot generate the empty language or the empty word, and that any production rule of the form $X\to YZ$ generates words with size at least $2$ .

The input of the exercise and the output with the solution (when the input is solvable) are as follows:

```
grammars: array [2] of array of array of array of int
length: int
numterminals: int
```
The input contains the two normalized grammars that must be compared (one in grammars[0] and the other one in grammars[1]), the specific length ${}>0$ that the counterexample must have, and the size numterminals ${}>0$ of the alphabet $\Sigma$ of terminal symbols. All symbols are represented by numbers: non-terminal symbols are represented by non-negative numbers $0,1,\ldots$ (where $0$ is always used for the start symbol of a grammar), and terminal symbols are represented by negative numbers $-1,-2,\ldots,-{}$ numterminals. For each g ${}\in\{0,1\}$ , the grammar grammars[g] is an array of arrays of arrays of integers with the following meaning: for each non-terminal nt of the grammar, grammars[g][nt] contains the list of all the right-hand sides of the production rules of the form nt ${}\to YZ$ or nt ${}\to a$ , and thus, for the i’th of such right-hand sides (counting from $0$ ), grammars[g][nt][i] is an array with either $2$ non-terminal symbols (for a case like nt ${}\to YZ$ ) or $1$ terminal symbol (for a case like nt ${}\to a$ ). For example, given a normalized grammar $S\to AB|a,\;A\to a,\,B\to b$ , with $S$ being the start symbol, we could use the encoding $S=0$ , $A=1$ , $B=2$ , $a=-1$ , $b=-2$ and represent the grammar as follows:
```
      Original        Representation
     -----------      --------------
     S -> AB | a      [[[1 2] [-1]]
     A -> a            [[-1]]
     B -> b            [[-2]]]
```
Note that the left-hand sides of the production rules are implicitly represented by the index into the outermost array (in this case, indexes $0$ , $1$ , $2$ , that represent $S$ , $A$ , $B$ , respectively), that each of the items in this outermost array is a list with the right-hand sides that correspond to that implicit left-hand side, and that each of such right-hand sides is either a list of $2$ non-terminal symbols (like [1 2] in the example, that represents $AB$ ), or $1$ terminal symbol (like [-1] and [-2] in the example, that represent $a$ and $b$ , respectively).
```
counterexample: array of int
```
The output is the counterexample that proves that the grammars are not equivalent (for words of the specified length), i.e., a word that is generated by one of the grammars but not by the other. Such word must be represented with an array with size length whose elements are terminal symbols among $-1,-2,\ldots,-{}$ numterminals.

Authors: Carles Creus / Documentation:

Exercise ‹15:

Exercise _‹15: