0

I have to process a large amount of small integers, up to 17 million (values are always between 0-255) and store them on somewhat type of array, I'm currently using a normal int array and the performance is not the greatest (as expected). The program on every execution access all the values of that array by about 260 times, so what is most important is to reduce execution time by reducing the time it takes to access all values of the array (without using Threads).

  • 3
    `int[]` should be the fastest you can get since there is 0 conversion to/from `int`. Are you sure it's the datastructure that's the bottleneck? – zapl May 10 '16 at 21:21
  • 1
    `access all the values of that array by about 260 times` change algorithm to reduce this number, also for values [0-255] you can use byte, but be careful with sign – Iłya Bursov May 10 '16 at 21:24
  • Perhaps there's a way to improve your program's locality of reference with respect to this array. http://stackoverflow.com/questions/7638932/what-is-locality-of-reference – David K May 10 '16 at 21:31
  • @zapl Well, I dont have a theoretical time so i was thinking that was the problem but if the int[] is the best, my code is short as possible except some Summations ill try to improve that. Thanks for the answer. –  May 10 '16 at 21:33
  • @Lashane that's something I'm trying to do, but since I learned the problem was not the int[], i'll try to check deeper that. –  May 10 '16 at 21:37
  • @DavidK Thanks for the share, i'll check that. –  May 10 '16 at 21:40
  • What i already achieved is reduce the amount of accesses to lowest the code permits to have, only have the minimum necessary to run the program, improved about 20% the time. –  May 10 '16 at 22:36
  • @zapl You may be wrong, see [my answer](http://stackoverflow.com/a/37218987/581205). The conversion is way cheaper than a cache miss and using less memory might help to eliminate cache misses. – maaartinus May 13 '16 at 20:40
  • @maaartinus yes, maybe, or it could introduce subtle bugs, or maybe make it slower http://stackoverflow.com/a/14532302/995891 (the revisit bit, the tests look unfortunately quite bad, e.g. no warmup as far as I can see..). – zapl May 14 '16 at 13:05

2 Answers2

1

You could use the short data type but that would probably not impact performance much. If you have 4.4 billion accesses it's going to take time.

You haven't said what "not the greatest" performance means or what you think it should be, but I believe you are constrained by the size of the problem.

Since this question feels like an XY problem I suggest you ask a new question and explain in much more detail the real nature of your goals. You may be missing optimizations that we can only guess at based on this question.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • Thanks for the answer, but i'll not since it's an assignment and it's import I try my best and search for an answer without a much deeper help, but thanks for the short data type tip. –  May 10 '16 at 21:39
  • Can you edit your post to indicate what performance you are seeing? If you really _need_ 4.4 billion array accesses you can gain some improvement if you reduce CPU cache misses by arranging things so you don't jump around in the array. I.e. if you have to access an element hundreds of times, if possible do all the accesses as close together as possible time-wise without accessing elements "far away" in the array at the same time. – Jim Garrison May 10 '16 at 21:43
1

As your values fit into a signed byte, you can use byte[]. The conversion to int looks then like

int x = a[i] & 255;

so you convert values in rande -128..127 into the unsigned range. Don't be scared by the additional operation, it's no slower than

int x = a[i];

as both result in a memory load and a widening instruction (with either zero or sign extension).

By using a byte[], you may gain speed assuming that it's the memory access what slows you down. Four times as much data fit in the cache and this may eliminate cache misses. You may gain a big factor or nothing at all, depending on the exact access pattern:

  • for sequential access, you'll probably gain nothing as the memory should be fast enough, unless your computation is totally trivial
  • for purely random access, you'll probably gain nothing as your L3 cache is probably smaller than 17 MB
  • for an access patterns, when nearby data get processed more, you may gain a lot

Given that you gave no details, it's all I can say.

maaartinus
  • 44,714
  • 32
  • 161
  • 320