"Cleaning up" nested if statements

Question

In a console program I am creating, I have a bit of code that parses through a file. After parsing each line, it is checked for syntax errors. If there is a syntax error, the program then stops reading the file and goes to the next part of the program. The problem is, it is very messy as my only solution to it so far is a series of nested if statements or a line of if statements. The problem with nested ifs is it gets very messy very fast, and a series of if statements has the program testing for several things that don't need to be tested. Heres some sudo code of my problem (note I am NOT using a return statement)

Pseudo code shown instead of real code, as it is very large

Nested if:

open file;
read line;
//Each if is testing something different
//Every error is different
if (line is valid)
{
    read line;
    if (line is valid)
    {
        read line;
        if (line is valid)
        {
            do stuff;
        }
        else
            error;
    }
    else
        error;

}
else
    error;
code that must be reached, even if there was an error;

Non-nested ifs:

bool fail = false;
open file;
read line;
//Each if is testing something different
//Every error is different
if (line is valid)
    read line;
else
{
    error;
    fail = true;
}
if (!error && line is valid)
    read line;
else
{
    error;
fail = true;
}
if (!error && line is valid)
    do stuff;
else
    error;
//Note how error is constantly evaluated, even if it has already found to be false
code that must be reached, even if there was an error;

I have looked at many different sites, but their verdicts differed from my problem. This code does work at runtime, but as you can see it is not very elegant. Is there anyone who has a more readable/efficient approach on my problem? Any help is appreciated :)

@Cubi73 Thanks for the reply, but this post is from a couple years ago, I don't remember what I was parsing lol. I've done plenty of file parsing since then, and usually do something similar to what you said (return from a function, throw in a try-catch, etc.) — Andrew Pratt, Apr 21 '21 at 23:31

Cornstalks · Answer 1 · 2017-12-03T19:38:32.510

Two options come to mind:

Option 1: chain reads and validations

This is similar to how std::istream extraction operators work. You could do something like this:

void your_function() {
  std::ifstream file("some_file");
  std::string line1, line2, line3;
  if (std::getline(file, line1) &&
      std::getline(file, line2) &&
      std::getline(file, line3)) {
    // do stuff
  } else {
    // error
  }
  // code that must be reached, even if there was an error;
}

Option 2: split into different functions

This can get a little long, but if you split things out right (and give everything a sane name), it can actually be very readable and debuggable.

bool step3(const std::string& line1,
           const std::string& line2,
           const std::string& line3) {
  // do stuff
  return true;
}

bool step2(std::ifstream& file,
           const std::string& line1,
           const std::string& line2) {
  std::string line3;
  return std::getline(file, line3) && step3(line1, line2, line3);
}

bool step1(std::ifstream& file,
           const std::string& line1) {
  std::string line2;
  return std::getline(file, line2) && step2(file, line1, line2);
}

bool step0(std::ifstream& file) {
  std::string line1;
  return std::getline(file, line1) && step1(file, line1);
}

void your_function() {
  std::ifstream file("some_file");
  if (!step0(file)) {
    // error
  }
  // code that must be reached, even if there was an error;
}

This example code is a little too trivial. If the line validation that occurs in each step is more complicated than std::getline's return value (which is often the case when doing real input validation), then this approach has the benefit of making that more readable. But if the input validation is as simple as checking std::getline, then the first option should be preferred.

score 1 · Answer 2 · answered Dec 03 '17 at 19:34

A common modern practice is an early return with RAII. Basically it means that the code that must happen should be in a destructor of a class, and your function will have a local object of that class. Now when you have error you exit early from the function (either with Exception or just plain return) and the destructor of that local object will handle the code that must happen.

The code will look something like this:

class Guard
{
...
Guard()
~Guard() { /*code that must happen */}
...
}

void someFunction()
{
    Gaurd localGuard;
    ...
    open file;
    read line;
    //Each if is testing something different
    //Every error is different
    if (!line)
    {
        return;
    }

    read line;
    if (!line)
    {
        return;
    }
    ...
}

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

Is there [...] a more readable/efficient approach on my problem

Step 1. Look around for a classical example of text parser

Answer: a compiler, which parses text files and produces different kind of results.

Step 2. Read some theory how does compilers work

There are lots of approaches and techniques. Books, online and open source examples. Simple and complicated.

Sure, you might just skip this step if you are not that interested.

Step 3. Apply theory on you problem

Looking through the theory, you will no miss such therms as "state machine", "automates" etc. Here is a brief explanation on Wikipedia:

https://en.wikipedia.org/wiki/Automata-based_programming

There is basically a ready to use example on the Wiki page:

#include <stdio.h>
enum states { before, inside, after };
void step(enum states *state, int c)
{
    if(c == '\n') {
        putchar('\n');
        *state = before;
    } else
    switch(*state) {
        case before:
            if(c != ' ') {
                putchar(c);
                *state = inside;
            }
            break;
        case inside:
            if(c == ' ') {
                *state = after;
            } else {
                putchar(c);
            }
            break;
        case after:
            break;
    }
} 
int main(void)
{
    int c;
    enum states state = before;
    while((c = getchar()) != EOF) {
        step(&state, c);
    }
    if(state != before)
        putchar('\n');
    return 0;
}

Or a C++ example with state machine:

#include <stdio.h>
class StateMachine {
    enum states { before = 0, inside = 1, after = 2 } state;
    struct branch {
        unsigned char new_state:2;
        unsigned char should_putchar:1;
    };
    static struct branch the_table[3][3];
public:
    StateMachine() : state(before) {}
    void FeedChar(int c) {
        int idx2 = (c == ' ') ? 0 : (c == '\n') ? 1 : 2;
        struct branch *b = & the_table[state][idx2];
        state = (enum states)(b->new_state);
        if(b->should_putchar) putchar(c);
    }
};
struct StateMachine::branch StateMachine::the_table[3][3] = {
                 /* ' '         '\n'        others */
    /* before */ { {before,0}, {before,1}, {inside,1} },
    /* inside */ { {after, 0}, {before,1}, {inside,1} },
    /* after  */ { {after, 0}, {before,1}, {after, 0} }
};
int main(void)
{
    int c;
    StateMachine machine;
    while((c = getchar()) != EOF)
        machine.FeedChar(c);
    return 0;
}

Sure, instead of chars you should feed lines.

This technique scales up to a complicated compilers, proven with tons of implementations. So if you are looking for a "right" approach, here it is.

I ended up using a variation of some of the code I saw online, and ultimately got it to do what I needed. Thanks for the link! — Andrew Pratt, Dec 05 '17 at 05:35