Structuring is the hardest part of decompiler development (at least for high level languages). That's a fairly simplistic algorithm, so it's a good starting point, but you'll likely want to use a better algorithm or make your own if you're working on a real decompiler.
With that out of the way, the answer to your actual question about how it can use do-while loops instead of while loops is already answered on the page you linked to.
Every loop can be described with a "do-while" statement.
The "while" loop (pre-tested loop) is a special case of a "do-while"
loop, where the bottom condition is always true, and the first
statement of the loop is an "if" that jumps out of the loop.
Say you had something like
beforeloop
while(foo) {
stmt1
stmt2
}
afterloop
It would be compiled to something along the lines of
beforeloop
LOOPBEGIN:
if !foo goto LOOPEND
stmt1
stmt2
goto LOOPBEGIN
LOOPEND:
afterloop
The decompiler algorithm converts this to
beforeloop
do {
if (!foo) {break}
stmt1
stmt2
} while (true)
afterloop
I hope that cleared that up. If not, feel free to ask about any other questions.
Edit: Example 2, showing how multiple loops with the same entry point can be collapsed.
for(;;) { while(foo) {} while(bar){} }
First off, for(;;)
is equivalent to while(true)
, so I'll use the following (pseudo)code instead
while(true) { while(foo) {stmt1} while(bar){stmt2} }
Let the outer loop be loop A and the inner loops be loop B and C. This compiles to something like the following pseudo assembly.
LOOP_A_BEGIN:
LOOP_B_BEGIN:
if !foo goto LOOP_B_END
stmt1
goto LOOP_B_BEGIN
LOOP_B_END:
LOOP_C_BEGIN:
if !bar goto LOOP_C_END
stmt2
goto LOOP_C_BEGIN
LOOP_C_END:
goto LOOP_A_BEGIN
But of course labels don't take up any space. So with identical labels collapsed, it becomes
POINT1:
if !foo goto POINT2
stmt1
goto POINT1
POINT2:
if !bar goto POINT3
stmt2
goto POINT2
POINT3
goto POINT1
Now, there are two points with backedges - point 1 and point 2. We can create one loop for each node, using labeled breaks for clarity. The transform isn't quite as straightforward since you have to mess with the if statements a bit, but it's still pretty easy.
LOOP1: while(true) {
IF1: if (!foo) {
break IF1;
}
else {
stmt1;
continue LOOP1;
}
LOOP2: while(true) {
if (!bar) {
break LOOP2;
}
else {
stmt2;
continue LOOP2;
}
}
continue LOOP1;
}
Now, the same code with unnecessary labels simplified out
while(true) {
if (!foo) {
}
else {
stmt1;
continue;
}
while(true) {
if (!bar) {
break;
}
else {
stmt2;
}
}
}
Now with if statements simplified
while(true) {
if (foo) {
stmt1;
continue;
}
while(true) {
if (!bar) {
break;
}
stmt2;
}
}
And finally, you can apply the while(true) if(!x)
transform to the inner loop. The outer loop can't be transformed like this, since it's not a simple while(cond) loop due to being the result of merged loops.
while(true) {
if (foo) {
stmt1;
continue;
}
while(bar) {
stmt2;
}
}
So hopefully, this demonstrates how you can always handle the case of multiple loops with the same entry point by merging them into a single loop, at the possible expense of having to rearrange some if statements too.