2

I have some file format decoder which returns a custom input iterator. The value type of this iterator (when dereferencing it with *iter) can be one of many token types.

Here's a simplified usage example:

File file {"/path/to/file"};

for (const auto& token : file) {
    // do something with token
}

How can this token have multiple possible types? Depending on the type of the token, the type of its payload changes too.

Performance is important here during traversal. I don't want any unnecessary allocation, for example. This is why the iterator's type is an input iterator: as soon as you advance the iterator, the previous token is invalidated as per the requirements of the InputIterator tag.

I have two ideas in mind so far:

  1. Use a single Token class with a private union of all the possible payloads (with their public getters) and a public type ID (enum) getter. The user needs to switch on this type ID to know which payload getter to call:

    for (const auto& token : file) {
        switch (token.type()) {
        case Token::Type::APPLE:
            const auto& apple = token.apple();
            // ...
            break;
    
        case Token::Type::BANANA:
            const auto& banana = token.banana();
            // ...
            break;
    
        // ...
        }
    }
    

    Although this is probably what I would choose in C, I'm not a fan of this solution in C++ because the user can still call the wrong getter and nothing can enforce this (except run-time checks which I want to avoid for performance concerns).

  2. Create an abstract Token base class which has an accept() method to accept a visitor, and multiple concrete classes (one for each payload type) inheriting this base class. In the iterator object, instantiate one of each concrete class at creation time. Also have a Token *token member. When iterating, fill the appropriate pre-allocated payload object, and set this->token = this->specificToken. Make operator*() return this->token (reference to). Ask the user to use a visitor during the iteration (or worse, use dynamic_cast):

    class MyVisitor : public TokenVisitor {
    public:
        void visit(const AppleToken& token) override {
            // ...
        }
    
        void visit(const BananaToken& token) override {
            // ...
        }
    };
    
    TokenVisitor visitor;
    
    for (const auto& token : file) {
        token.accept(visitor);
    }
    

    This introduces additional function calls for each token, at least one of them which is virtual, but this might not be the end of the world; I remain open to this solution.

Is there any other interesting solution? I consider that returning a boost::variant or std::variant is the same as idea #2.

eepp
  • 7,255
  • 1
  • 38
  • 56
  • Reverse the approach. Instead of returning an iterator, get a callable object and iterate internally. This way you can invoke the right overload each time and an user cannot accidentally make a mistake. – skypjack Aug 24 '17 at 05:55
  • @skypjack Can you elaborate, perhaps as an answer with a concrete, simple example? This looks promising. – eepp Aug 24 '17 at 06:02
  • [Here](https://stackoverflow.com/a/45854286/4987285) it is. – skypjack Aug 24 '17 at 06:20
  • 1
    this is crying out for variant. There is no dynamic memory allocation in a variant. – Richard Hodges Aug 24 '17 at 06:34
  • @RichardHodges What if I have 25-30 possible types; could an `std::variant` or `boost::variant` accomodate this? AFAIK `boost::variant` is limited at compile time to something around 10 choices. – eepp Aug 24 '17 at 06:49
  • If you have that many possible types in the design, then I would argue it's a design smell, and that you probably should rethink your design. – Some programmer dude Aug 24 '17 at 06:53
  • @Someprogrammerdude I don't see how, honestly. What if this file format is a sequence of objects of 25-30 unrelated types and what I want my API to offer is the possibility to iterate them in order? I'm not in control of this file format, I just want to extract objects from it. – eepp Aug 24 '17 at 07:06
  • If *any* of the types have any kind of shared commonality, then it cries out for inheritance and polymorphism, and that could lower the number of *base* types considerably. However, without knowing exactly what you're trying to actually do, what the *actual* problem you're trying to solve might be, then all we can do is speculate. Can you please tell us more about the program you're developing? What is the file you're supposed to read and parse? – Some programmer dude Aug 24 '17 at 07:12
  • Yes I agree about this. My problem is not concrete yet, just hypothetical, as are your answers, and I appreciate them. I like @skypjack's solution actually. – eepp Aug 24 '17 at 07:21
  • @eepp My solution simply does in C++11/14 with a tagged union what you can do in C++17 with a `std::variant` (better and with better performance actually, also with a nicer syntax thanks to `std::visit`). Anyway C++17 isn't the first choice for everybody yet and you have just considered `boost::variant` in the question, so it doesn't worth it to add an answer that says - _ehi, you can use something you already know and you've even mentioned_. Does it? :-D – skypjack Aug 24 '17 at 07:32

1 Answers1

2

Although this is probably what I would choose in C, I'm not a fan of this solution in C++ because the user can still call the wrong getter and nothing can enforce this (except run-time checks which I want to avoid for performance concerns).

You can reverse the approach and accept a callable object instead of returning an iterator to the user. Then you can iterate the container internally and dispatch the right type. This way users cannot do mistakes anymore by ignoring the information carried up with your tagged union, for you are in charge of taking it in consideration.

Here is a minimal, working example to show what I mean:

#include <vector>
#include <utility>
#include <iostream>

struct A {};
struct B {};

class C {
    struct S {
        enum { A_TAG, B_TAG } tag;
        union { A a; B b; };
    };

public:
    void add(A a) {
        S s;
        s.a = a;
        s.tag = S::A_TAG;
        vec.push_back(s);
    }

    void add(B b) {
        S s;
        s.b = b;
        s.tag = S::B_TAG;
        vec.push_back(s);
    }

    template<typename F>
    void iterate(F &&f) {
        for(auto &&s: vec) {
            if(s.tag == S::A_TAG) {
                std::forward<F>(f)(s.a);
            } else {
                std::forward<F>(f)(s.b);
            }
        }
    }

private:
    std::vector<S> vec;
};

void f(const A &) {
    std::cout << "A" << std::endl;
}

void f(const B &) {
    std::cout << "B" << std::endl;
}

int main() {
    C c;
    c.add(A{});
    c.add(B{});
    c.add(A{});
    c.iterate([](auto item) { f(item); });

}

See it up and running on Coliru.

skypjack
  • 49,335
  • 19
  • 95
  • 187
  • Why should I reverse the approach? What about having a common base for all the possible alternatives, for example `Item`, and specific item classes, and put the equivalent templated method in `Item` instead with the same switch on its private tag and call of the appropriate passed callable? Then the user can call `item.visit(myCallable);`. I like the iterator concept here because the user can stop at any point, copy the iterator, seek the iterator to different points in the file, etc. Then I guess using this templated method vs. a classic visitor only avoids virtual method calls. – eepp Aug 25 '17 at 05:58
  • @eepp So that you can keep control over the accesses to the types. Otherwise an user could make a mistake. That was your concern in point 1, right? – skypjack Aug 25 '17 at 08:18
  • I'm talking about solution #2, but instead of a classic visitor based on virtual methods (the virtual `accept()` method accepts a visitor and calls back its specific visiting method for this type), use your idea of a static visitor with a templated method accepting a callable and calling it with the specific (casted) type after switching on a tag, just like a variant would do. – eepp Aug 25 '17 at 12:03
  • 1
    @eepp Yeah, that's another way to do that indeed. But your classes still have to inherit from a base that has a virtual method. – skypjack Aug 25 '17 at 12:06
  • Why the virtual method? What about [this](https://wandbox.org/permlink/k2sbxVeX5FXTm0kp)? – eepp Aug 25 '17 at 12:28
  • @eepp Oh, ok, you wanted to mix both of the points up in a single solution. I didn't get it, sorry. – skypjack Aug 25 '17 at 12:58