thrust::reduce_by_key performance with few key repetitions

Question

I have to do keyed reductions of arrays with many different keys that repeat only once in a while:

keys =  {1,2,3,3,4,5,6,7,7, 8, 9, 9,10,11,...}
array = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,...}

// after reduction
result = {1,2,7,5,6,7,17,10,23,13,14}

Using thrust::reduce_by_key (or any other segmented reduction method) is not the fastest option here as most operations are in fact just copies from one array to another.

What would be a better approach to this problem?

Is it possible to know a priori which segments have length > 1 ? If the answer is no, then it seems like a hard problem because the cost of the data movement of just detecting the segments will be comparable to the cost of ```reduce_by_key```. — Jared Hoberock, Feb 23 '12 at 19:06
@JaredHoberock: Yes, in fact I have the length of the segments stored in another array. — bbtrb, Feb 23 '12 at 19:08

score 4 · Answer 1 · answered Feb 23 '12 at 19:55

Actually, reduce_by_key is the appropriate algorithm to use here. It's just that the current implementation in Thrust is not as fast as it could be. To elaborate, there's nothing to prevent reduce_by_key from executing at memcpy speed, and I believe other implementations already achieve that rate. Our tentative plan for Thrust v1.7 includes making performance improvements to reduce_by_key and other scan-based algorithms using the codes in the related back40computing project.

Note that when the segments are either (1) long or of (2) uniform length then it's possible to do better than reduce_by_key. For example, at some point it's more economical to use an offset-based segment descriptor than keys or head flags. However, when the segments are short (as in your case) or of highly variable length, then an optimal reduce_by_key implementation is really the best tool for the job.

thrust::reduce_by_key performance with few key repetitions

1 Answers1