I've been facing a design issue for a while :
I am parsing a source code string into a 1-dimensional array of token objects.
Depending on the type of a token (litteral, symbol, identifier), it has some token-type-specific data. Litterals have a value, symbols have a symbol type, and identifiers have a name.
I'm then building an abstract representation of the script defined in that source code string by analyzing this 1 dimensionnal array of tokens. The syntax analyzing logic is done outside of these token objects.
My problem is that I need all my tokens, no matter their type, to be stored into a single array, because it seems easier to analyze and because i don't see any other way to do it. This involves having a common type for all different token types, by either creating a class hierarchy :
class token { token_type type; };
class identifier : public token { string name; };
class litteral : public token { value val; };
class symbol : public token( symbol_type sym; };
... or by creating a variant :
class token
{
token_type type;
string name; // Only used when it is an identifier
value val; // Only used when it is a litteral
symbol_type sym; // Only used when it is a symbol
};
The class hierarchy would be used as follows :
// Iterator over token array
for( auto cur_tok : tokens )
{
// Do token-type-specific things depending on its type
if( cur_token->type == E_SYMBOL )
{
switch( ((symbol *) cur_token)->symbol_type )
{
// etc
}
}
}
But it has several problems :
The base token class has to know about it's subclasses, which seems wrong.
It involves down casting to access specific data depending on the type of a token, which i was told is wrong too.
The variant solution would be used in a similar way, without down-casting :
for( auto cur_token: tokens )
{
if( cur_token->type == E_SYMBOL )
{
switch( cur_token->symbol_type )
{
// etc
}
}
}
The problem of that second solution is that it mixes everything into a single class, which doesn't seem very clean to me, as there are unused variables depending on the type of the token, and because a class should represent a single "thing" type.
Would you have another possibility to suggest to design this ? I was told about the visitor pattern, but i can't imagine how i would use it in my case.
I would like to keep the possibility of iterating over an array because i might have to iterate in both directions, from a random position and maybe multiple times.
Thank you.