1

I'm writing performance-intensive Fortran code, which has at its core a matrix-free matrix multiplication subroutine mfmult(x,y), which takes an input vector x, and returns an output vector y such that if i = i_{n-1}i_{n-2}...i_2i_1i_0 is an n-bit binary number (possibly with leading zeros), then

y(i) = sum{x(j) : j is any 1-bit negation of i}

i.e.

y(000) = x(001)+x(010)+x(100)
y(1101) = x(0101)+x(1001)+x(1111) + x(1100) etc.

What would be an efficient way to implement this? A related question is: does there exist a fast intrinsic in Fortran for single-bit negation? I've had a look at the intrinsics and https://rosettacode.org/wiki/Bitwise_operations#Fortran, but there is no single-bit negation operation, and I worry any manual coding/branching statements will make things too slow.

Paradox
  • 1,935
  • 1
  • 18
  • 31
  • Well, the CPU always accesses the whole byte anyway. You can use IBSET(), but I don't expect that to be particularly efficient. You could also use IEOR() with some particular constant. – Vladimir F Героям слава Jan 27 '17 at 07:38
  • The problem with IBSET and IBCLR is that I first need a BTEST if-statement to check if the bit is a 0 and 1 so I can pick the opposite, which I expect runs pretty slow in a huge loop. I'm not sure how I'd use IEOR(). Flipping a single-bit seems like such a fundamental operation that these methods seem like overkill. Is this just a limitation of Fortran as a language that it doesn't have this intrinsic? – Paradox Jan 27 '17 at 09:23
  • If you want to call it a limitation... What is the equivalent in C or x86 assembly you think is missing in Fortran? – Vladimir F Героям слава Jan 27 '17 at 09:46
  • Thanks, the IEOR approach works really fast. In fact I was quite surprised to find that it worked even faster (about twice as fast) than using a sparse matrix to explicitly carry out the vector transformation. After doing more research I now see other languages just use this approach for bit negation as well. – Paradox Jan 28 '17 at 13:05

2 Answers2

5

I suggest you to use the IEOR() intrinsic. You must prepare the right mask first. It may seem to be indirect, but I don't think there can be anything more direct in fact, see C / Assembly: how to change a single bit in a CPU register?

So if you have integer i, and you want to flip the third bit, I would really do:

i = ieor(i, int(b'00000100'))

What this will do in the x86_64 assembly is

xorl    $4, the_register_holding_i

If you want to flip multiple bits in that integer it is better to prepare the mask accordingly and call IEOR() just once.

Note that the CPU always works with the whole byte (at least) or more efficiently with a whole word, there are no instructions which would only take an address of a bit and manipulate only that one. You don't have to worry that your are accessing the whole 32-bit integer, because you have a 32-bit or 64-bit processor. The instruction set is prepared to work with 32-bit and 64-bit integers efficiently.

Community
  • 1
  • 1
2

This should be efficient:

program test
  implicit none
  integer :: filter(0:31)
  data filter / &
    b'00000000000000000000000000000001', &
    b'00000000000000000000000000000010', &
    b'00000000000000000000000000000100', &
    b'00000000000000000000000000001000', &
    b'00000000000000000000000000010000', &
    b'00000000000000000000000000100000', &
    b'00000000000000000000000001000000', &
    b'00000000000000000000000010000000', &
    b'00000000000000000000000100000000', &
    b'00000000000000000000001000000000', &
    b'00000000000000000000010000000000', &
    b'00000000000000000000100000000000', &
    b'00000000000000000001000000000000', &
    b'00000000000000000010000000000000', &
    b'00000000000000000100000000000000', &
    b'00000000000000001000000000000000', &
    b'00000000000000010000000000000000', &
    b'00000000000000100000000000000000', &
    b'00000000000001000000000000000000', &
    b'00000000000010000000000000000000', &
    b'00000000000100000000000000000000', &
    b'00000000001000000000000000000000', &
    b'00000000010000000000000000000000', &
    b'00000000100000000000000000000000', &
    b'00000001000000000000000000000000', &
    b'00000010000000000000000000000000', &
    b'00000100000000000000000000000000', &
    b'00001000000000000000000000000000', &
    b'00010000000000000000000000000000', &
    b'00100000000000000000000000000000', &
    b'01000000000000000000000000000000', &
    b'10000000000000000000000000000000'/
  integer :: a
  integer :: k

  ! Number whose single-bit negations should be found
  a = b'1101'

  do k=0,31
    print *, ieor(a,filter(k))
  enddo

end
Anthony Scemama
  • 1,563
  • 12
  • 19
  • Just a style comment - data is a little old fashioned, I'd initialise filter in the declaration, and make it a parameter as well – Ian Bush Jan 28 '17 at 21:03