1

I have a certain binary number for example: 11110000 And a mask with exactly 4 bits set: 10101010 Im searching for a fast operation that would return the 4 bits of the input corresponding to the positions where the mask is set:

11110000 10101010 output:1100

In here I use a u8 to u4 but imagine this for u64 to u32. Is there a fast way to do this (in Rust or C) without looping?

I tried looping the indexes, and linear decomposition. Both are very slow as one requires looping the other requires a matrix decomposition.

cafce25
  • 15,907
  • 4
  • 25
  • 31
  • 1
    Don't spam tags. If you want Rust or C, then remove C++ and Javascript tags. – kiner_shah Mar 10 '23 at 11:00
  • I do not know what is *"mask with exactly 4 bits set"* and how is related to the expected output. I do not see any relationship between them – 0___________ Mar 10 '23 at 11:11
  • Why do you need that result? It's not clear what the purpose is (this sort of question usually ends with a popcount but that doesn't seem to be the case here) – Masklinn Mar 10 '23 at 11:12
  • @0___________ they have a bitmask with n bits set, they want to know which of the set bits of the mask matched, as an n-bits result (n=4). So basically the bitmask "compacted" to drop all the bits set to 0 in the mask. – Masklinn Mar 10 '23 at 11:13
  • If there are always exacly 4 bits set you can iterate through an array of 4 bit positions or masks. – Weather Vane Mar 10 '23 at 11:13
  • 2
    When your mask is known, can't you split your mask into 4 masks, each with one bit and then shift the result of each mask to the bit where you want to have it stored in your output? [Here](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=3ccda103ac2c5a6f061e81d4bb7db464) is what I mean with your concrete example. I'd think this'd be a pretty fast operation – Jonas Fassbender Mar 10 '23 at 11:15
  • @0___________ "that would return the 4 bits of the input corresponding to the positions where the mask is set" not really sure how to make that more clear, everyone else seemed to understand the relationship between the expected output and the mask – António Leitão Mar 10 '23 at 11:16
  • FWIW this request looks to be the PEXT instruction from BMI2 (however that is extremely slow on AMD so it should be considered intel-only, not sure whether it even exists in ARM or RISCV). https://stackoverflow.com/q/66091979/8182118 seems to essentially be this question in C. – Masklinn Mar 10 '23 at 11:17
  • 1
    @JonasFassbender yes thats a good idea (which would be the equivalent of a linear decomposition where the basis are the 4 different vectors) it works well but scales poorly for u32 and above. I wondered if there was a faster way – António Leitão Mar 10 '23 at 11:18
  • @Masklinn I wanted this to control a bit better conversion from integers with many bits to integer with less bits – António Leitão Mar 10 '23 at 11:20
  • 1
    @JonasFassbender FWIW Godbolt supports Rust and is better / more convenient than the playground for this sort of demos, [here is your snippet there](https://godbolt.org/z/1qe95nnnq). – Masklinn Mar 10 '23 at 11:21
  • @Haris its a general bit manipulation question, the time complexity is is at the same order for both languages. Besides not really sure how to go about here as i think opening the same question for two different languages could be considered spam? – António Leitão Mar 10 '23 at 11:22
  • There are c intrinsics _pext_u32 and _pext_u64 in as part of the BMI2 instruction set extension which should be able to do this pretty fast if you don't mind using non-standard x86_64. – Simon Goater Mar 10 '23 at 11:48
  • Welcome to SO. You should add the `language-agnostic` tag to such a question. And no, it wouldn't be considered spam, I doubt a question about `nullptr`, tagged C23, would be closed as a duplicate for a C++ question about `nullptr`. – Harith Mar 10 '23 at 11:54
  • @SimonGoater thats a good start and thanks for the suggestion – António Leitão Mar 10 '23 at 11:58

1 Answers1

0

If you can preprocess your selection mask, then you can do this in log(width) number of mask-and-shift operations.

If your selection mask has any odd-sized gaps in it, then you can first apply a mask of bits to shift one position to the right, with the result that there will only be even-sized gaps between the bits you want.

Starting with a selection mask of 10101010, for example, the bits to shift to the right 1 place are 00100010. After shifting those bits, the new selection mask to apply is 10011001, which has only even-sized gaps.

Then there is a mask for shifting 2 positions to the right that will make all your gap sizes divisible by 4: 10011001 -> 10000111

If you continue this way until the gap size is divisible by the word size, then there won't be any gaps: 10000111 -> 00001111

To calculate these masks, first calculate the total shift required for each bit position. The first mask clears the 1 bit from all odd total shifts. The second mask clears the 2 bits, etc.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87