Possible improvements to Java's ArrayList/Scala's ArrayBuffer?

Question

Currently, the "growing" algorithm figures out that the Array backing the ArrayList/ArrayBuffer is too small for the requested operation and copies the contents to the beginning of a larger array.

jsuereth explains it very well in the comments of this thread:

ArrayBuffer is great for append but not as good for prepend. Java's ArrayList will actually try to amortize costs to prepend as well, making it slightly better in my opinion. Yes ArrayBuffer is probably good enough if you're just appending on to a list and indexing elements.

Wouldn't it be a good enhancement to make the location of the old contents depend on the last operation, based on the assumption that this operation might be called more often in the future?

I. e.:

if append needs a larger array, copy the existing contents to the front of the new array:
```
[x|x|x|x|x|x] 
       |
       v
[x|x|x|x|x|x| | | | | ]
```
if prepend needs a larger array, copy the existing contents to the back of the new array:
```
[x|x|x|x|x|x] 
       |
       v
[ | | | | |x|x|x|x|x|x]
```

Will this solve the performance problems for prepend, while generally making the algorithm a bit more adaptive to usage patterns? (Worst case would be alternatively appending/prepending large stuff ...)

Are there any other data structures which already take the last operation into account when growing the underlying structure?

If you need such a data structure, please be my guest. Or maybe you should use a LinkedList instead. — solendil, May 31 '11 at 12:02
Maybe the "Wouldn't it be a good enhancement [...]" or the "Will this solve [...]" parts? — soc, May 31 '11 at 12:05
The first is very subjective and the second very localized. "Wouldn't it be a good enhancement?" is a discussion-question, for which SO isn't well-suited. "Will this solve [...]?" is a very-specific question to which the answer is "try it". — Joachim Sauer, May 31 '11 at 12:07
It's not about enforcing some arbitrary standards (not that WP admins ever do that!), but about choosing the appropriate venue. This is probably better suited as a bug report (ideally with a patch!) to the OpenJDK. — Joachim Sauer, May 31 '11 at 12:12

score 4 · Answer 1 · answered May 31 '11 at 12:06

4

Perhaps what you need is ArrayDeque. This has an O(1) operation for append and prepend (unless the capacity is changed)

This has an array where it has a index to the head and tail which allows it to write the start or end position without having to shuffle all the entries down/up.

answered May 31 '11 at 12:06

Peter Lawrey

525,659
79
751
1,130

This sound like pretty much what I imagined. Does the growing also take the requested operation into account? Or will the old array just copied to the middle of the new one? – soc May 31 '11 at 12:10
The expense of the operation doesn't depend on the position of the data so there is no advantage is arranging the data from a particular point. It happens to copy the data so the `head = 0`, but this is not a requirement. – Peter Lawrey May 31 '11 at 12:13
Yes, it might be asymptomatically the same, but if I always prepend, it requires far more growing operations than taking the last operation into account. – soc May 31 '11 at 12:16
Where do you get "if I always prepend, it requires far more growing operations"? I repeat, "This has an O(1) operation for append and prepend" They cost the same regardless of position. This is the whole point of suggesting this class. – Peter Lawrey May 31 '11 at 12:19
My question is specially focused on the performance problems when growing the underlying array. I know that both head and tail "normally" have O(1). I thought that was perfectly clear... – soc May 31 '11 at 12:46
@soc, You have proposed a problem and I proposed a collection which does not have that problem. IMHO: The solution is either to switch collections or change the collection so it behaves the same. – Peter Lawrey May 31 '11 at 12:56
@Peter Lawrey - ArrayDeque is O(1) AMORTIZED. Ie, O(1) on average. It's still O(n) for those individual instances where the underlying array needs to be re-sized. If I understand soc correctly, he's hunting that O(1) constant array. LinkedList is your best bet, if you are ONLY interested in optimaizing append/prepend, it has true O(1) performance. Drawback is decreased performance in random access. – pap May 31 '11 at 13:01
@pap, The other draw back being that every operation which doesn't increase the capacity (which cannot grow without limit) will be much slower with LinkedList. If you just set the capacity to be large enough from the start it will never resize. The longer an application runs, the less likely any collection would need to grow. – Peter Lawrey May 31 '11 at 13:04
@pap: Thanks pap. My desire is to reduce the worst case, were prepend _always_ needs growing the array and I was wondering if there are existing implementations of ArrayList/ArrayBuffer doing exactly that. – soc May 31 '11 at 13:05
@soc - Gotcha. Then I defer to Peter Lawrey - ArrayDeque is what you want. – pap May 31 '11 at 16:00

score 3 · Answer 2 · answered May 31 '11 at 12:10

In the general sense, MOST pre-defined structures and algorithms are based on some assumption of the most common use-cases. As it's impossible to create a "general" implementation that covers every possible scenario, there will always be some outside cases where the approach is less than optimal.

If your particular use-case is performance-critical enough that this is a problem, my suggestion is to create your own Array implementation, tailored to your usage. Beware the sub-optimization devil though. Rarely it's actually worth the increased cost of maintenance to save those few odd CPU cycles or bytes of memory.

score 2 · Answer 3 · answered May 31 '11 at 12:07

2

You can use ArrayDeque for an array-backed data structure with fast prepend.

answered May 31 '11 at 12:07

gustafc

28,465
7
73
99

score 2 · Answer 4 · answered Jun 01 '11 at 21:39

2

If you're not going to do a lot of lookups, and just be doing linear traversals, then go for the UnrolledBuffer - an unrolled linked list implementation. Otherwise, a Vector and its +: should be an efficient choice.

answered Jun 01 '11 at 21:39

axel22

32,045
9
125
137

Interesting. And thanks for letting me know that this class exists .. I have never seen it before! – soc Jun 02 '11 at 16:13

Possible improvements to Java's ArrayList/Scala's ArrayBuffer?

4 Answers4