0

There are two versions of a function (the code below is a simplified version). Both versions are used in the program. In the actual function, the differences between the two versions can occur at two or three different places.

How to avoid writing both versions in the code without sacrificing performance, through template or other means? This is an attempt to make the code more readable.

Performance is critical because it will get run many many times, and I am writing benchmark for different implementations.

(Also, is this an ok api, if I am writing a library for a few people?)

Example:

int set_intersect(const int* A, const int s_a,
                  const int* B, const int s_b,
                  int* C = 0){

    //if (int* C == 0), we are running version
    //0 of the function.

    //int* C is not known during compilation
    //time for version 1.

    int Count0 = 0;
        //counter for version 0 of the function.

    const int* const C_original(C);
        //counter and pointer for version 1 of
        //the function

    int a = 0;
    int b = 0;
    int A_now;
    int B_now;

    while(a < s_a && b < s_b){
        A_now = A[a];
        B_now = B[b];
        a += (A_now <= B_now);
        b += (B_now <= A_now);
        if (A_now == B_now){    
            if (C == 0){
                Count0++;
            } else {
                C++;
                *(C)=A_now;
            }
        }
    }
    if (C == 0){
        return Count0;
    }else{
        return C - C_original;
    }
}

Thanks.


Updates:

Conditional compile-time inclusion/exclusion of code based on template argument(s)

(some of those templates look so long)

Remove/Insert code at compile time without duplication in C++

(this is more similar to my case. my case is simpler though.)


I guess the following can work, but it adds a new argument.

int set_intersect(const int* A, const int s_a,
                  const int* B, const int s_b,
                  int* C = 0,
                  char flag);

put all code for version 0 into if (flag == '0') { /* version 0 code */ }

put all code for version 1 into if (flag == '1') { /* version 1 code */}

Probably can put the flag variable into template (as Barmar suggested in comments), that way, it doesn't feel like adding another argument for the function. Can also replace the 0 and 1 with enum (like enum class set_intersection_type {find_set, size_only}). Calling the function will be like set_intersect<find_set>(const int* A, const int s_a, const int* B, const int s_b, int* C) or set_intersect<size_only>(const int* A, const int s_a, const int* B, const int s_b) Hopefully this is more readable than before, and the compiler is smart enough to see what is going on.

Another problem is, what if someone uses the findset version (version 1), and then forgets to change the default argument (int C* = 0)? It is possible to call the function this way: set_intersect<find_set>(const int* A, const int s_a, const int* B, const int s_b).

May be I can use dasblinkenlight's idea in the comments. Create two wrapper functions (set_intersection, set_intersection_size). Each wrapper calls the actual function with different arguments. Also list the actual function as a private function so no one can call it directly.

For the different implementations of set intersections, maybe can create a common wrapper with templates. Calling the wrapper would be similar to set_intersection<basic>, set_intersection<binary_search>, or set_intersection_size<simd> etc. This seems to look better.

Community
  • 1
  • 1
rxu
  • 1,369
  • 1
  • 11
  • 29
  • You can only use templates for things that are known at compile time. – Barmar Jul 08 '16 at 22:17
  • I see... the version 0 case is known at compile time, can that work with template? The actual function is long and repeating is quite bad for readability. – rxu Jul 08 '16 at 22:18
  • So it either gets called with C either a literal null pointer, or a pointer variable that will never be null? – Barmar Jul 08 '16 at 22:21
  • Yes. That is the case. The aligned memory allocater for std::vector I am using now avoids the null pointer case. – rxu Jul 08 '16 at 22:22
  • Do `s_a` and `s_b` have some practical upper limit? Is caller the one making sure that ` *(++Count1)=A_now;` is not going to result in a buffer overrun? – Sergey Kalinichenko Jul 08 '16 at 22:25
  • The premise of this question is fatally flawed, so I'll only register this as a comment. If performance is as critical as you claim, you would not worry about avoiding duplication of code.You would bite the bullet and have two distinct versions, and carefully decide which one to use at the call point. And that is essentially what you can do with templates or with function overloading - have one version (specialisation if using a template) that assumes `C` is NULL (so eliminates all code using `C`) and another that assumes `C` is non-null (so does things to `C` without checking if `C` is NULL). – Peter Jul 08 '16 at 22:27
  • @dasblinkenlight. Yes and Yes. – rxu Jul 08 '16 at 22:27
  • 1
    I'm not really a template expert, but I'm thinking something like `template set_intersect(...)`. Then you'd call it as `set_intersect(...)` or `set_intersect(...)`. – Barmar Jul 08 '16 at 22:29
  • 1
    With that, the compiler should be able to perform the `if(C == 0)` at compile time, and generate the appropriate code. – Barmar Jul 08 '16 at 22:30
  • Why not implement a four-argument override `set_intersect(const int*A, size_t sa, const int* B, size_t sb) {int temp[MAX_A_B_LIMIT]; set_intersect(A, sa, B, sb, temp);}` and remove the case when `C==0` from your code completely? – Sergey Kalinichenko Jul 08 '16 at 22:31
  • My point is that there will be a trade-off of readability against performance whereas you seek to maximise both. That won't happen. You can either have a performance version that does not track things to `C`, or a tracking version which loses performance. – Peter Jul 08 '16 at 22:32
  • @Peter Current version of the code write the two version separately. Condensing the two version into a single function is purely for readability. I am not sure, but may be there will be a next person that work on this code later. If there is no good way, I guess I will explain in comments. – rxu Jul 08 '16 at 22:34
  • @dashblinkenlight i heard that memory read/write is slow. Version 0 actually get used more often than version 1. That override doesn't eliminate the memory write that is not needed in version 0. Thanks for the idea though. I might use that in other places. – rxu Jul 09 '16 at 12:21

3 Answers3

1

Generally seems doable, question is whether you want to do that. Would say no. From what I can tell you do two different things:

  1. Version 0 computes the size of the intersection
  2. Version 1 computes the size of the intersection, and writes the intersection to the location past C*, assuming there is enough space to store it.

I would not only for speed, but also for clarity make two distinct functions, set_intersection and set_intersection_size, but if you insist on having one I would benchmark your code against std::set_intersection, and if possible just redirect to the ::std version if C != 0.

In your current version I would not use your library. However I would also be hard-pressed to come up with a situation where I would prefer a custom-made version of set_intersection to the STL version. If I ever needed performance better than the STL, I would expect to have identified the point in code as the bottleneck, and I would not use a library call at all, but write the code myself, possibly in assembly and unrolling the loop etc..

What bugs me a bit is how this is supposed to work:

const int* const Count1(C);
    //counter and pointer for version 1 of
    //the function
...
    Count1++;
    *(Count1)=A_now;
midor
  • 5,487
  • 2
  • 23
  • 52
  • That part of the code is fixed. I guess I will just keep the two version of the function without trying to merge them. And work on other parts of the code. Thanks for the suggestions. – rxu Jul 09 '16 at 04:16
  • Calling the function/return values does waste time. That is a good point. I probably will ask the compiler to inline the whole thing. – rxu Jul 09 '16 at 13:51
1

If it is known at compilation time what version you want you can use Conditional Compilation.

#define Version_0 //assuming you know this compilation is version 0

Then you can go:

int set_intersect(...)
#ifdef Version_0
    //Version 0 of the code
#else
    //Version 1 of the code

This way only one version of the code gets compiled.

If you don't know which version it is for compilation, I suggest having two separate functions so you don't have to check for the version every instance of the function.

Informat
  • 808
  • 2
  • 11
  • 25
  • That can work if the two version works for different types of cpu, and I only want to compile for one version. But in this case, I want to use both versions. Thanks for the idea though. – rxu Jul 09 '16 at 12:38
1

Have a type that you specialize on a bool parameter:

template<bool b>
struct Counter
{
};

template<>
struct Counter<false>
{
    int c;

    Counter(int *)
    : c(0)
    {
    }

    int operator++() { return ++c; }
    void storeA(const int a_now) {}
};

template<>
struct Counter<true>
{
    const int* const c;

    Counter(int * c_orig)
    : c(c_orig)
    {
    }

    int operator++() { return ++C; }
    void storeA(const int a_now) { *C = a_now; }
}

Then specialize your algorithm on Counter as a template argument. Note that this will be exactly the same for both cases, that is, you don't need to specialize:

template<typename Counter>
struct SetIntersectHelper
{
    static int set_intersect(const int* A, const int s_a,
                             const int* B, const int s_b,
                             int* C)
    {
        // your function's body, using Counter
    }
};

Now, you're ready to add the generic method:

int set_intersect(const int* A, const int s_a,
                  const int* B, const int s_b,
                  int* C = 0)
{
    return C ? SetIntersectHelper< Counter< true  > >::set_intersect(A, s_a, B, s_b, C):
               SetIntersectHelper< Counter< false > >::set_intersect(A, s_a, B, s_b, C);
}
lorro
  • 10,687
  • 23
  • 36