6

I need to parse a source code. I've identified 3 different types of tokens : symbols (operators, keywords), litterals (integers, strings, etc...) and identifiers.

I already have the following design, with a base class that keeps track of the type of the subclass so it can be downcasted using a base class pointer :

class Token
{
    type_e type; // E_SYMBOL, E_LITTERAL, E_TOKEN
};

class Symbol : public Token
{
    const symbol_e symbol;
};

class Litteral : public Token
{
    const Value value;
};

class Identifier : public Token
{
    const std::string name;
};

I need these classes to be stored in a single array of tokens, that's why i need them to have a common base class. Then i use them like this :

if( cur->type == E_SYMBOL && static_cast< const Symbol * >( cur )->symbol == E_LPARENT )
{
    // ...
}

I could create virtual functions isSymbol, isLitteral, isIdentifer that each subclass would override, but i would still have to downcast the base class pointer to a subclass pointer so i can access the subclass's specific data.

People say downcasting means the interface is likely flawed, and it's making the syntax very heavy, so i'd like to find another way, but i can't. Some people suggested the visitor pattern, but i'm afraid this would uselessly complexify the code and i don't even understand how i could use the visitor pattern with this problem.

Can anyone help ? Thank you :)

Virus721
  • 8,061
  • 12
  • 67
  • 123
  • 1
    Make `Token` a [`boost::variant`](http://www.boost.org/doc/libs/1_55_0/doc/html/variant.html) maybe? – Captain Obvlious Jun 21 '14 at 21:39
  • 1
    You don't *need* a common base class to store different types in one array. For example, the variant type mentioned above. – chris Jun 21 '14 at 21:39
  • Though Symbol, Literal and Identifier are 'Token's in real life, inheritance here would not benefit you. I think inheritance is beneficial only if the classes can be processed without knowing the exact type. Like, if a Token needs to do something, you just call doSomething() and Literal, Identifier etc. handles the situation differently. – holgac Jun 21 '14 at 21:43
  • Cont'd: Are you doing the logic outside classes? it seems so. Then, you should try to move it to inside methods. boost::variant is a good programmatical, yet bad architectural solution here. – holgac Jun 21 '14 at 21:46
  • @CaptainObvlious Yeah that's what i'm using in my class Value (int, float, string, etc). I don't wanna use boost though (only STL), i want it to be pure C++. Though the variant thing doesn't seem very natural. Having all the attributes of these different tokens types in a single class looks a bit messy and not very "C++". – Virus721 Jun 21 '14 at 21:46
  • @revani Yeah all the analyzing logic is made outside, using an array of these tokens that gets turned into an abstract representation of the program. It seems strange to me to put the program compiling logic inside the tokens. – Virus721 Jun 21 '14 at 21:47
  • 1
    @Virus721: Boost is the purest of the purest C++. You can't have any purer C++ than boost. Well, seriously, for one thing boost is kind of staging area for the standard library (just like STL used to be many years ago). And for the other, you can easily pick out specific bits from boost; they are mostly independent. – Jan Hudec Jun 21 '14 at 21:48
  • @JanHudec Yeah but i'm doing this parser just to show i can do it, so i want to avoid libraries, except for the STL because revinventing strings and vectors would be more of a waste of time. – Virus721 Jun 21 '14 at 21:49
  • @Cheersandhth.-Alf Yeah but what do you mean by "simple" ? A good old struct with all the trash put in it ? Or a code that is nicely designed and simple to understand ? – Virus721 Jun 21 '14 at 21:51
  • 1
    @Virus721: Reinventing discriminated union would be just as much waste of time. – Jan Hudec Jun 21 '14 at 21:52
  • 1
    @Virus721 When people say "_the interface is likely flawed_", I guess what they really mean is: _your design is likely flawed_. You should probably ask a question about how to design your program rather than how to avoid downcasting. – D Drmmr Jun 21 '14 at 21:53
  • @DDrmmr Avoiding downcasting by using a different design, that's what i had in mind. Any idea appart from what has been suggested ? – Virus721 Jun 21 '14 at 21:54
  • 2
    @Virus721 No, because you have given no useful information about your problem. – D Drmmr Jun 21 '14 at 21:56
  • You can do almost everything in a parser like this with virtual member functions. You can add an function `virtual ... evaluate (...)` or `virtual ... process(...)`. The `...` depends on what makes sense for the `evaluate/process` function in your parser. – R Sahu Jun 21 '14 at 22:27

1 Answers1

7

You have three options. Each solution has it's advantages and disadvantages.

  • Put the logic into the token classes, so the calling code does not need to know which kind of token it is dealing with.

    This would be the "purest object oriented" solution. The disadvantage is that the logic tends to spread between the base class and subclasses, which makes it harder to follow. It may also cause the classes to grow rather large. But compilers/interpreters don't usually have that many actions for this to be a problem.

  • Use the Visitor Pattern.

    That is have an interface TokenVisitor with visit method overloaded for the token subtypes and accept(TokenVisitor&) method on Token which each subclass would override to call the appropriate overload of visit.

    You now need to know the complete set of token types in the interface, but it allows keeping the classes reasonably small and grouped by action the logic is usually easier to follow.

  • Use a discriminated union, for example Boost.Variant.

    This is not object oriented at all. It will lead to switches over the type all over the place and will probably look ugly. But since the logic is all together it is often easier to follow, especially for somebody who does not understand the idea behind the code.

backscattered
  • 185
  • 1
  • 11
Jan Hudec
  • 73,652
  • 13
  • 125
  • 172