1

I'm trying to compare one 64-bit value with a 64-bit value array, say

R_UINT64 FP; R_UINT64 INPUT[20000];

It returns true if any element in the array matches the value of FP.

I have to loop through this array and find a match, I'm trying to improve the efficiency by looking at 2 elements, instead of one, at a time.

In Altivec, vector length is 128 bits, so I will put two copies of FP, two elements in the vectors.(I'm truncating them both two 8 bits each vector element)

So far so good, but now I'm encountering a problem. I couldn't find a VMX procedure that looks at only half of the vector and see if there's a match, in order to return a true, both values have to match, which is not what I'm looking for.

So I'm wondering if there is anyway to tell the compiler that I'm only looking at half of the vector each time?

Thanks in advance!

Paul R
  • 208,748
  • 37
  • 389
  • 560
Tal_
  • 761
  • 2
  • 6
  • 13

1 Answers1

2

Probably the best thing is to compare the two elements and then use vec_mergeh/vec_mergel to test each half of the result, e.g.

size_t vec_search_u64(const uint64_t key, const uint64_t array[], const size_t len)
{
    const vector signed int vkey = { key >> 32, key & 0xffffffff, key >> 32, key & 0xffffffff };
    const vector bool int vk1 = { -1, -1, -1, -1 };

    for (i = 0; i < len - 1; i += 2)      // iterate two elements at a time
    {
        vector signed int v = vec_ld(0, (int *)&array[i]);
                                          // load 2 elements
        vector bool int vcmp = vec_cmpeq(v, vkey);
                                          // compare 2 elements with key
        if (vec_all_eq(vec_mergeh(vcmp, vcmp), vk1))
        {                                 // if high element matches
            return i;                     // return match found at element i 
        }
        if (vec_all_eq(vec_mergel(vcmp, vcmp), vk1))
        {                                 // if low element matches
            return i + 1;                 // return match found at element i + 1
        }
    }
    if (i < len)                          // if array size is odd
    {
        if (array[i] == key)              // test last element
        {
            return i;
        }
    }
    return (size_t)(-1);                      // match not found - return suitable value
}

(Note: untested code - for general guidance only - may need casts and/or actual bug fixes !)

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • Note that if you're running 64 bit POWER or PowerPC this may not be much faster than just doing a straightforward scalar compare with a 64 bit register, but it's worth a try if you need a modest performance improvement. – Paul R Sep 16 '13 at 22:30
  • Hi Paul, the original question does not allow me to implement in a massive parallel programming. I'm wondering...is there anyway to hmm load the value onto the vector one at a time? say 32 bits, can I load one 32 bit int at a time instead of loading 4 all at once? Thanks! – Tal_ Sep 20 '13 at 21:14
  • Sorry - I don't really understand what you mean - what would be the point of loading just one value into a vector ? Why wouldn't you just do a normal scalar comparison if you just want to compare one element at a time ? – Paul R Sep 20 '13 at 21:18
  • hmmm No, the reason I want to load one at a time is that the calculation of values is sequential. So the situation is like an accumulator(say sum), sum+=(a value), and the calculation stops when sum equals to a reference value(say ref) I'm thinking of calculating 4 sum onto a vector in one iteration of a for loop and compare them with a vector of 4 copies of ref. I'm not sure if that's gonna give me any benefit though, since I have to check which one matches afterwards...but I guess it doesnt hurt to try – Tal_ Sep 20 '13 at 21:34
  • I suggest just implementing this in simple scalar code for now and only consider using SIMD if and when you have profiled the performance and are certain that you need to optimise this. – Paul R Sep 20 '13 at 22:19