3

I was following this blog and was trying to replace a dynamic polymorphism code into using std::variant and std::visit. But I can not get std::variant + std::visit to work better than a virtual struct impl. Its about 1.3-1.5x slower! (GCC 10.3 -O3 C++17)

The use case is as follows. Say we are comparing i'th and j'th row of two tables. A table could have heterogeneously typed columns. Assume we can access column buffers. What I am doing is testing,

def IndexEqual(Table:A, Table:B, int:i, int:j):
  for c in range(A.num_cols):
     if not A.column(c)[i] == B.column(c)[j]:
         return False
  return True

for dynamic polymorphism I have the following for int and float

struct Comp{
  virtual bool comp(size_t i, size_t j) const = 0;
};

struct CompI: public Comp {
  CompI(const int *data_1, const int *data_2) : data1(data_1), data2(data_2) {}

  const int *data1, *data2;
  bool comp(size_t i, size_t j) const override {
    return data1[i] == data2[j];
  }
};

struct CompF: public Comp {
  CompF(const float *data_1, const float *data_2) : data1(data_1), data2(data_2) {}

  const float *data1, *data2;
  bool comp(size_t i, size_t j) const override {
    return data1[i] == data2[j];
  }
};

bool IndexEqual1(const std::vector<Comp *> &comps, size_t i, size_t j) {
  for (auto &&a: comps) {
    if (!a->comp(i, j)) {
      return false;
    }
  }
  return true;
}

This was transformed to std::variant + std::visit as follows.

struct EqualToI {
  EqualToI(const int *data_1, const int *data_2) : data1(data_1), data2(data_2) {}
  const int *data1, *data2;
  bool comp(size_t i, size_t j) const {
    return data1[i] == data2[j];
  }
};

struct EqualToF {
  EqualToF(const float *data_1, const float *data_2) : data1(data_1), data2(data_2) {}
  const float *data1, *data2;
  bool comp(size_t i, size_t j) const {
    return data1[i] == data2[j];
  }
};

using var_type = typename std::variant<EqualToI, EqualToF>;

bool IndexEqual(const std::vector<var_type> &comps, size_t i, size_t j) {
  for (auto &&a: comps) {
    if (!std::visit([&](const auto &comp) {
      return comp.comp(i, j);
    }, a)) {
      return false;
    }
  }
  return true;
}

I benchmarked this here https://quick-bench.com/q/u-cBjg4hyQjOs6fKem9XSdW7LMs

Can someone one please explain why this std::variant + std::visit option is slower than the dynamic polimorphism approach? I was expecting otherwise! Is there a problem in my approach and/or benchmark?

n1r44
  • 581
  • 1
  • 4
  • 12
  • 1
    That blog that you are following is not describing static polymorphism. Try this blog instead: https://mropert.github.io/2017/11/30/polymorphic_ducks/ – Eljay Oct 05 '21 at 03:51
  • There are two ways to implement std::variant, with dynamic dispatch and with magic. – n. m. could be an AI Oct 05 '21 at 05:13

2 Answers2

5

Using variant does not constitute "static polymorphism". It is still dynamic polymorphism because the compiler does not know which type will actually inhabit the variant. Therefore, the code in question must attempt to figure out what the variant stores at runtime, and it must dispatch those calls accordingly.

Note that the article you link to also doesn't call it "static polymorphism". It emphasizes that it is merely a different form of "runtime polymorphism" from virtual functions.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • I see. Thank you for the clarification (I changed the title accordingly). Is there a better way I could achieve this? – n1r44 Oct 05 '21 at 03:53
  • 1
    @n1r44 So you want a dynamic dispatch implementation that is faster than all known ones, right? – n. m. could be an AI Oct 05 '21 at 05:26
  • @n1r44 sure just hand roll your own dynamic dispatch. Move polymorphism down, removing a level of indirection or two. Then weeks of profiling and tweaking against realistic data. – Yakk - Adam Nevraumont Oct 05 '21 at 09:56
0

In theory, using std::variant and std::visit to implement dynamic polymorphism should not be slower than the implementation based on virtual functions.

In your approach, the lambda function used in the following code

if (!std::visit([&](const auto &comp) {
  return comp.comp(i, j);
}, a)) {
  return false;
}

repeatedly copies the pointers of i and j in the std::visit function, which is the main reason for the performance difference.

oldcb
  • 1
  • 1