8

I have the following situation: there's a huge set of templates like std::vector that will call memmove() to move parts of array. Sometimes they will want to "move" parts of length zero - for example, if the array tail is removed (like std::vector::erase()), they will want to move the remainder of the array which will happen to have length zero and that zero will be known at compile time (I saw the disassembly - the compiler is aware) yet the compiler will still emit a memmove() call.

So basically I could have a wrapper:

inline void callMemmove( void* dest, const void* source, size_t count )
{
   if( count > 0 ) {
       memmove( dest, source, count );
   }
}

but this would introduce an extra runtime check in cases count is not known in compile time that I don't want.

Is it somehow possible to use __assume hint to indicate to the compiler that if it knows for sure that count is zero it should eliminate the memmove()?

Suma
  • 33,181
  • 16
  • 123
  • 191
sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • 1
    what are you hoping to save with this? Seems like a micro-micro-micro optimisation? You are going to save two branches (looking at the basic gnu implementation of `memmove`)? – Nim Oct 05 '11 at 07:58
  • 5
    @Nim: The branching, the call to `memmove()` and also (the most important part) that would allow to optimize away some code around the `memmove()` call - no call means its arguments preparation is not needed. Yes, it's micro, but it saves microseconds. – sharptooth Oct 05 '11 at 08:02
  • Are you trying to implement your own vector? – David Rodríguez - dribeas Oct 05 '11 at 08:14
  • @David Rodríguez - dribeas: Sort of - for training purposes. – sharptooth Oct 05 '11 at 08:18
  • 1
    Oh, come on, people, sharptooth seems to have enough experience to know that "premature optimization is the root of evil" and that you should not implement your own vector unless you have serious reasons. There are cases where there are reasons for both, let us assume now this is one of them and let us try to solve the problem, not declare it non-problem. – Suma Oct 05 '11 at 08:18
  • 4
    It’s weird that the compiler should emit a call to `memmove` when it has already detected the length 0. In reality, the call should be inlined, the zero-size loop be detected and elided. Why isn’t this happening? Are you linking against a dynamic runtime? If so, write a wrapper for `memmove` that looks like what you’ve written above. – Konrad Rudolph Oct 05 '11 at 08:52
  • 2
    @Konrad Rudolph: AFAIK the reason is that `memmove()` is implemented in assembly in Visual C++ runtime sources and not presented to the compiler. – sharptooth Oct 05 '11 at 08:55
  • @KonradRudolph memcpy is inlined (actually "intrinsiced") and compiler is smart enough to eliminate it, but memmove is not. – Suma Oct 05 '11 at 11:29
  • 1
    @Suma Hm. Any reason why? I realise that it’s probably implemented in assembly but then link-time optimisation should take care of the necessary inlining. – Konrad Rudolph Oct 05 '11 at 12:09
  • @KonradRudolph Link time CG cannot do any optimizations on assembly functions. Even inlining an assembly function is not possible. 1) you cannot change the way arguments are passed to it, 2) the functions already ends with ret or possibly with multiple rets, there is no reliable way how to "trim" this ret out. memcpy implementation is completely different, it is not only inlined, it is handled as intrinsics by the compiler, and the compiler can use everything it knows to decide how to compile it. – Suma Oct 05 '11 at 12:43
  • possible duplicate of [C++ compile-time constant detection](http://stackoverflow.com/questions/3299834/c-compile-time-constant-detection) – Suma Oct 05 '11 at 12:54
  • @Suma Link-time optimisation rewrites the code (it needs to, in order to inline!). I don’t see how this differs from C++ to assembly. Both are just object files (with extra information). Unless, of course, VC++ doesn’t provide appropriate library distributions. That would be lame. – Konrad Rudolph Oct 05 '11 at 13:05
  • 1
    @KonradRudolph It is much different, and there is extra information. LTCG works not with a "native code", but with a symbolic representation of the code (this is why it needs to be enabled also when compiling objects, not only when linking). This cannot be done from assembly. See e.g. http://msdn.microsoft.com/en-us/magazine/cc301698.aspx for more information. – Suma Oct 05 '11 at 13:30

4 Answers4

3

The point of the __assume is to tell the compiler to skip portions of code when optimizing. In the link you provided the example is given with the default clause of the switch construct - there the hint tells the compiler that the clause will never be reached even though theoretically it could. You're telling the optimizer, basically, "Hey, I know better, throw this code away".

For default you can't not write it in (unless you cover the whole range in cases, which is sometimes problematic) because it would cause compilation error. So you need the hint to optimize the code you know that is unneeded out.

In your case - the code can be reached, but not always, so the __assume hint won't help you much. You have to check if the count is really 0. Unless you're sure it can never be anything but 0, then just don't write it in.

littleadv
  • 20,100
  • 2
  • 36
  • 50
3

This solution uses a trick described in C++ compile-time constant detection - the trick uses the fact compile time integer zero can be converted to a pointer, and this can be used together with overloading to check for the "compile time known" property.

struct chkconst {
  struct Small {char a;};
  struct Big: Small {char b;};
  struct Temp { Temp( int x ) {} };
  static Small chk2( void* ) { return Small(); }
  static Big chk2( Temp  ) { return Big(); }
};

#define is_const_0(X) (sizeof(chkconst::chk2(X))<sizeof(chkconst::Big))
#define is_const(X) is_const_0( int(X)-int(X) )

#define memmove_smart(dst,src,n) do { \
    if (is_const(n)) {if (n>0) memmove(dst,src,n);} \
    else memmove(dst,src,n); \
  } while (false)

Or, in your case, as you want to check for zero only anyway, one could use is_const_0 directly for maximum simplicity and portability:

#define memmove_smart(dst,src,n) if (is_const_0(n)) {} else memmove(dst,src,n)

Note: the code here used a version of is_const simpler than in the linked question. This is because Visual Studio is more standard conformant than GCC in this case. If targeting gcc, you could use following is_const variant (adapted to handle all possible integral values, including negative and INT_MAX):

#define is_const_0(X) (sizeof(chkconst::chk2(X))<sizeof(chkconst::Big))
#define is_const_pos(X) is_const_0( int(X)^(int(X)&INT_MAX) )
#define is_const(X) (is_const_pos(X)|is_const_pos(-int(X))|is_const_pos(-(int(X)+1)))
Community
  • 1
  • 1
Suma
  • 33,181
  • 16
  • 123
  • 191
  • In all operations that take external iterators, the implementation cannot possibly know that the iterators handed in are not from the same container, and thus you cannot use `memcpy`, and the same goes with `erase` (if you erase an element in the middle and there are more than two elements beyond that, the ranges are guaranteed to overlap). You could use `memcpy` on the other hand when growing the buffer, as that guarantees that the source and destinations are not overlapped. – David Rodríguez - dribeas Oct 05 '11 at 08:30
  • 1
    No, this is ordinary function resolution, there is even no template. x(void *a) is used when value is a zero constant, x(Temp a) otherwise (Temp can be constructed from an int, but it is not a preferred overload for zero). I find it supercool as well. The original source of the idea seems to be http://encode.ru/threads/396-C-compile-time-constant-detection – Suma Oct 06 '11 at 09:51
  • @sharptooth That is not SFINAE but could potentially be used in SFINAE. About the answer, I think it is cool, but I don't really see how this helps with the problem. The premise (as I understood it) is that the compiler did not remove the `if (count > 0)` in a case where it was *known* at compile time to be 0, how does changing `compile_time_constant_0 > 0` to `sizeof(X) > sizeof(Y)` affect how the compiler generates the code? (Assuming that 0 is known at compile time, both should be equally easy to optimize) Next question would be how did sharptooth come to that conclusion... – David Rodríguez - dribeas Oct 06 '11 at 10:35
  • 1
    @DavidRodríguez-dribeas No, the issue was different - when condition was used, if (count > 0) and memmove was elided correctly for known zero, but the if was still left there when the value was not compile time known, presenting unnecessary overhead. With this solution there is no overhead, yet zero moves are eliminated. – Suma Oct 06 '11 at 10:53
  • Why is the gcc version more complex? – Adrian May 18 '16 at 14:16
1

I think that you misunderstood the meaning of __assume. It does not tell the compiler to change its behavior when it knows what the values are, but rather it tells it what the values will be when it cannot infer it by itself.

In your case, if you told it to __assume that count > 0 it will skip the test, as you already told it that the result will always be true, it will remove the condition and will call memmove always, which is exactly what you want to avoid.

I don't know the intrinsics of VS, but in GCC there is a likely/unlikely intrinsic (__builtin_expect((x),1)) that can be used to hint the compiler as to which is the most probable outcome of the test. that will not remove the test, but will layout code so that the most probable (as in by your definition) branch is more efficient (will not branch).

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • 2
    VS doesn't have anything like likely/unlikely and that's rather sad. – sharptooth Oct 05 '11 at 09:51
  • I seem to recall that by default it assumes that the first branch (the if) is taken, which means that if that is the least expected branch you might be able to affect the generated code by reverting the condition and the if/else clauses – David Rodríguez - dribeas Oct 05 '11 at 10:04
  • @ David Rodríguez - dribeas: That trick doesn't work when you have an `if` without an `else`. – sharptooth Oct 05 '11 at 12:09
  • `if (not condition) {} else { body }`? Or you mean that the compiler will generate the same code as in `if (condition) { body }`? From the point of view of code location they will generate the same thing, the code will be right where the if is, and there will be a jump to the end, but I am not sure whether the actual test/jump will be the same and/or whether the cpu will process it differently – David Rodríguez - dribeas Oct 05 '11 at 12:30
  • 1
    @ David Rodríguez - dribeas: The compiler will indeed generate the same code for an `if` without `else` and for an `if-else` with empty `if` branch. – sharptooth Oct 05 '11 at 12:31
1

If its possible to rename the memmove, I think something like this would do - http://codepad.org/s974Fp9k

struct Temp {
  int x;
  Temp( int y ) { x=y; }
  operator int() { return x; };
};

void memmove1( void* dest, const void* source, void* count ) {
  printf( "void\n" );
}

void memmove1( void* dest, const void* source, Temp count ) {
  memmove( dest, source, count );
  printf( "temp\n" );
}

int main( void ) {
  int a,b;
  memmove1( &a,&b, sizeof(a) );
  memmove1( &a,&b, sizeof(a)-4 );
}

I think the same is probably possible without the class - have to look at conversion rules to confirm it.

Also it should be possible to overload the original memmove(), eg. by passing an object (like Temp(sizeof(a)) as 3rd argument.

Not sure which way would be more convenient.

Shelwien
  • 2,160
  • 15
  • 17