16

I've heard that it's been proven theoretically possible to express any control flow in a Turing-complete language using only structured programming constructs, (conditionals, loops and loop-breaks, and subroutine calls,) without any arbitrary GOTO statements. Is there any way to use that theory to automate refactoring of code that contains GOTOs into code that does not?

Let's say I have an arbitrary single subroutine in a simple imperative language, such as C or Pascal. I also have a parser that can verify that this subroutine is valid, and produce an Abstract Syntax Tree from it. But the code contains GOTOs and Labels, which could jump forwards or backwards to any arbitrary point, including into or out of conditional or loop blocks, but not outside of the subroutine itself.

Is there an algorithm that could take this AST and rework it into new code which is semantically identical, but does not contain any Labels or GOTO statements?

Mason Wheeler
  • 82,511
  • 50
  • 270
  • 477
  • 1
    Related: http://stackoverflow.com/questions/7671459/algorithm-for-rewriting-modified-goto-semantics?rq=1 – irrelephant Dec 27 '12 at 22:02
  • But the assembler will replace it again by (conditional) jumps! – wildplasser Dec 27 '12 at 22:04
  • 2
    @wildplasser: The assembler's irrelevant. I'm trying to translate code from an old language that supports GOTOs (and in which they were used frequently) to a newer one that does not, but which has many technical advantages over the other language. I can make 99% of it work already, but I don't know how to refactor out the GOTOs. – Mason Wheeler Dec 27 '12 at 22:06
  • @irrelephant: I saw that, but it's specifically limited in scope to only forward jumps. My case is not, so not a duplicate. – Mason Wheeler Dec 27 '12 at 22:08
  • Hint: if you need indicator variables you should stop. – wildplasser Dec 27 '12 at 22:12

3 Answers3

9

In principle, it is always possible to do this, though the results might not be pretty.

One way to always eliminate gotos is to transform the program in the following way. Start off by numbering all the instructions in the original program. For example, given this program:

start:
    while (true) {
        if (x < 5) goto start;
        x++
    }

You could number the statements like this:

0 start:
1     while (x < 3) {
2         if (x < 5) goto start;
3         x++
      }

To eliminate all gotos, you can simulate the flow of the control through this function by using a while loop, an explicit variable holding the program counter, and a bunch of if statements. For example, you might translate the above code like this:

int PC = 0;
while (PC <= 3) {
    if (PC == 0) {
         PC = 1;             // Label has no effect
    } else if (PC == 1) {
         if (x < 3) PC = 4;  // Skip loop, which ends this function.
         else PC = 2;        // Enter loop.
    } else if (PC == 2) {
         if (x < 5) PC = 0;  // Simulate goto
         else PC = 3;        // Simulate if-statement fall-through
    } else if (PC == 3) {
         x++;
         PC = 1;             // Simulate jump back up to the top of the loop.
    }
}

This is a really, really bad way to do the translation, but it shows that in theory it is always possible to do this. Actually implementing this would be very messy - you'd probably number the basic blocks of the function, then generate code that puts the basic blocks into a loop, tracks which basic block is currently executing, then simulates the effect of running a basic block and the transition from that basic block to the appropriate next basic block.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 2
    BTW: this is the Jacopini method. (Google for "structured programming with goto Knuth") – wildplasser Dec 27 '12 at 22:30
  • @wildplasser- I didn't know that this had a name! Thanks for the reference! – templatetypedef Dec 27 '12 at 22:32
  • I just read the Knuth article. It has may nice observations, BTW. Knuth recently (~2004?) published another goto-driven statemachine; in fact generated code with computed & meaningful labels (a few thousand ...) , but I don't seem to be able to find it anymore. It was beautifull and hilarious at the same time. – wildplasser Dec 27 '12 at 22:43
  • [This paper](http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=288377) discuss the transformation in detail. They have cited Jacopini but they claim that their work is more complete. – Asiri Rathnayake Dec 28 '12 at 00:55
  • @Asiri: Is it available from any sources that are not locked behind a paywall? That makes your link much less useful to a general Q&A site like this... – Mason Wheeler Dec 28 '12 at 21:21
  • @MasonWheeler: Type the title of the paper into Google (or Scholar). The direct link would start a PDF download, this is why I linked to the abstract. – Asiri Rathnayake Dec 29 '12 at 11:12
6

I think you want to read Taming Control Flow by Erosa and Hendren, 1994. (Earlier link on Google scholar).

By the way, loop-breaks are also easy to eliminate. There is a simple mechanical procedure involving the creating of a boolean state variable and the restructuring of nested conditionals to create straight-line control flow. It does not produce pretty code :)

If your target language has tail-call optimization (and, ideally, inlining), you can mechanically remove both break and continue by turning the loop into a tail-recursive function. (If the index variable is modified by the loop body, you need to work harder at this. I'll just show the simplest case.) Here's the transformation of a simple loop:

for (Type Index = Start;        function loop(Index: Type):    
     Condition(Index);              if (Condition)
     Index = Advance(Index)){           return                      // break
   Body                             Body
}                                   return loop(Advance(Index))     // continue
                                loop(Start)

The return statements labeled "continue" and "break" are precisely the transformation of continue and break. Indeed, the first step in the procedure might have been to rewrite the loop into its equivalent form in the original language:

{
    Type Index = Start;
    while (true) {
        if (!Condition(Index))
            break;
        Body;
        continue;
    }
}
rici
  • 234,347
  • 28
  • 237
  • 341
  • Is the Taming Control Flow paper available from any sources that are not locked behind a paywall? – Mason Wheeler Dec 28 '12 at 21:21
  • That paper was exactly what I'm looking for. It explains in clear language how to transform a routine with GOTOs into a routine without them, and it explains how to do *exactly* that transformation, as opposed to the brute-force description given in templatetypedef's answer, which does much more, at great cost to both efficiency and readability. – Mason Wheeler Dec 28 '12 at 23:00
1

I use either/both Polyhedron's spag and vast's 77to90 to begin the process of refactoring fortran and then converting it to matlab source. However, these tools always leave 1/4 to 1/2 of the goto's in the program.

I wrote up a goto remover which accomplishes something similar to what you were describing: it takes fortran code and refactors all the remaining goto's from a program and replacing them with conditionals and do/cycle/exit's which can then be converted into other languages like matlab. You can read more about the process I use here:

http://engineering.dartmouth.edu/~d30574x/consulting/consulting_gotorefactor.html

This program could be adapted to work with other languages, but I have not gotten than far yet.

Ben Barrowes
  • 103
  • 5