Bit masking with predefined masks vs Bit shift operations?

Question

I am a hobby programmer doing some things with Poker abstraction. I have cards encoded in 8 bits, the least significant half encodes the rank. the next two bits its suit-id (not suit per se, since then I would needlessly add complexity, It starts with 00 at the first card and only gets incremented when a new card doesn't match a card that is already in play). At the most significant 2 bits I encoded the card location 00 for Hand, 01 for flop and so on.

In order to extract the data I am unsure what is better, bit-masking using predefined masks? or bit-shifting operations? (or something completely different?). Below is an example written in golang of a function I wrote and rewrote to extract the rank of the card. Is one superior to the other?

Example 1:

func GetRank(card byte) byte { // extrapolates rank from card
    var rankmask byte 
    rankmask= 240 // represents 11110000 in binary
    return card &^ rankmask // knocks down the location and suit info
}

VS example 2:

func GetRank(card byte) byte { // extrapolates rank from card
    bc <<= 4 //shift away location and suit-id 
    return bc >> 4 //shift rank back into place
}

I read that bit-shifting is supposed to generally be faster, but I would be doing 2 longer bit-shift operations in comparison to one bit-masking.

Integer literals in Go can be hexadecimal and even binary, that will be more readable than some decimal constant like 240. — harold, Jul 26 '20 at 10:57
Well, you have 1) a single binary AND, versus 2) two shifts. First one definitely seems faster. It would probably depend on which assembly operations Go decides to use and the machine code you end up with. Enable optimizations, compile, and then look at the disassembly of the function. — Marco Bonelli, Jul 26 '20 at 11:12
@harold Many thanks for the info, I will invest some time to learn hexadecimal notation. — Some guy, Jul 26 '20 at 11:15
@MarcoBonelli So the translation to machine code is what decides what is faster? I suppose I will then try to figure that stuff out. — Some guy, Jul 26 '20 at 11:19
@Someguy yeah I would say so, yes. It's such a simple operation that you would need to check the generated machine code. I would assume the Go compiler is smart enough to produce a single AND for 1 and just two shifts for 2 (it could even optimize 2 into a single AND). On Intel Skylake for example AND is 1 uop and SHL/SHR are 1 uop so that would put case 1 ad 1 total uop and case 2 at 2 total uops. — Marco Bonelli, Jul 26 '20 at 11:23
@MarcoBonelli So if I understand correctly each bit-wise opperation tends to have a cost of 1 uop? and the less uops the better? case 1 is actually an AND NOT operation tough. would this not put the cost at 2 uops also? — Some guy, Jul 26 '20 at 11:34
@Someguy if we are specifically talking about x86-64, yes, 1 uop for shift with immediate and 1 uop for and with immediate. The AND NOT is optimized at compile time for sure. `&^ 240` is `& 15`, so that's not a problem really. — Marco Bonelli, Jul 26 '20 at 11:36
@ Marco Bonelli thanks for the little introduction to bit-wise operators. I think I learned a bit more! ;) — Some guy, Jul 26 '20 at 11:39
Bonus: [here](https://godbolt.org/z/WeorM6) you can see that at least GCCGO 10.2 with `-O3` optimizes *both functions* down to just `and eax, 15`. — Marco Bonelli, Jul 26 '20 at 11:43
Any of the two operations could be more faster. It's even dependent on the machine on which this code runs. You can try [Benchmark](https://golang.org/pkg/testing/#hdr-Benchmarks) tests in Go and test the performances of worst, best and average case scenarios for the two cases though. — Vaibhav Mishra, Jul 26 '20 at 11:46
@VaibhavMishra would this still be the case if the machine code optimisation boils down to the same instructions, as Marco Bonelli implied? — Some guy, Jul 26 '20 at 11:50
If the assembly generated by the compiler were to be same for both the functions, then of course you needn't concern with benchmarking them. But the point is, it's not necessarily true that both of these operations will give you the same assembly code. Different versions of the compiler _can_ give you different outputs for different machines and running a benchmark is the more convinient way to test this than checking for the assembly outputs imo. It's just a test, you run it and be done with it while getting answers. — Vaibhav Mishra, Jul 26 '20 at 12:22

Bit masking with predefined masks vs Bit shift operations?

0 Answers0