6

I'm implementing a sliding window over a stream of events, in Java. So I want a data structure which allows me to do the following:

  1. add to the end of the data structure when new events occur;

  2. remove from the start of the data structure when old events are processed;

  3. get standard random access (size(), get(i)) to the elements of the data structure; in general, typical List "read" operations;

  4. is efficient for all of the above operations;

  5. is unbounded.

No other access is required. And no thread safety is required.

I'm currently doing this with an ArrayList, to get things up and running. But I want something more efficient; the remove(0) method (2. above) is inefficient with an ArrayList.

Numbers 1. and 2. are standard Queue-style operations. However, the implementations of Queue in the JDK (such as ArrayDeque) don't allow for get(i) in 3.

So, I'm wondering if there are any libraries out there which have such an implementation, and are suitable for commercial use.

If not, I guess I'll resort to writing my own...

Calum
  • 83
  • 1
  • 7
  • How relatively-frequent are these operations? You may have to trade some serious space for time -- e.g. you could use a cicularly linked list with an index for rapid retrieval (index: I forget the name for this structure; a binary tree of pointers to 0 and n/2, each of which stores pointers to the midpoint of their half, etc.). – Alex Feinman Nov 05 '09 at 18:55

8 Answers8

4

Seems like a task for a Circular Buffer - as long as it's okay if the queue has a fixed capacity. I don't know of any standard implementation though. But here is a nice recipe to roll your own.

sfussenegger
  • 35,575
  • 15
  • 95
  • 119
  • Forgot to say that it should be unbounded - have now updated the question. – Calum Nov 05 '09 at 18:44
  • You couod increde the capacity if needed - like arraylist does for instance. It's expensive though – sfussenegger Nov 05 '09 at 19:02
  • Doubling an array in size when you need more capacity still makes insertion amortized constant time, so it's only "expensive" if you need hard latency guarantees. The question was about Java, where GC is usually a bigger deal for unpredictable latency than array doubling. I'd say a circular buffer is the right answer for the requirements. – Jamey Sharp Nov 06 '12 at 19:31
4

I got this problem and tried solving it by copying the source code of ArrayDeque and adding something like:

E get(index){ return elements[(head + index) % size];}
eeerahul
  • 1,629
  • 4
  • 27
  • 38
Jacky Li
  • 41
  • 1
  • Thanks, I also just ported ArrayDeque so I could use it on Android API Level 8. I just had to add the following method: `public E get(int index) {return elements[(head + index) % size()];}` – gsgx Nov 10 '12 at 21:56
  • 1
    Actually, the code I posted resulted in a null pointer at times. This code works, but I don't know why: `public E get(int index) {return elements[(head + index) % elements.length];}` – gsgx Nov 10 '12 at 22:27
  • @gsingh2011: Internally, the queue is backed by an array with preallocated size, which is `elements`. The `elements.length` thus returns the size of the internal array instead of the `size()` of the queue itself. – FuzzY Jun 03 '15 at 02:36
3

If you have a large enough array you can implement a queue with a basic array, and just use an index as to the head of the list, and use the mod operator so you can wrap around if needed.

So, you basically have a circular array that supports the insert and remove functions.

UPDATE:

It is a quick operation to copy an array to a larger array, so just double in size, perhaps, when getting close to the end, and just copy the array, as a step in doing insert. Overall you will still have very fast access, as the norm should not be to increase and copy.

James Black
  • 41,583
  • 10
  • 86
  • 166
  • More specifically, doubling an array in size when you need more capacity still makes insertion amortized constant time. This is a sound approach. – Jamey Sharp Nov 06 '12 at 19:32
2

How fast are the events coming in and out of this queue?

On the one hand, you can have a "sufficiently large" circular buffer.

While it is technically "bounded", you can make it "unbounded" by growing it as necessary.

By the same token, you can "shrink" it down in terms of overall capacity when it's "quiet".

But, for many applications a 100, 1000, or even 10000 item capacity circular buffer is effectively unbounded.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203
1

The only library I can think of that would implement such an interface would be a LinkedList but frankly I'm not sure what the performance characteristics are.

Jasoon
  • 432
  • 2
  • 9
  • This won't allow for efficient random access - calling get(i). – Calum Nov 05 '09 at 18:34
  • Very true. I'd be interested to see what comes up here, since I'm not sure how you could effectively roll both efficient random access with the kind of speed you get from a bog standard FIFO queue... – Jasoon Nov 05 '09 at 18:37
  • ArrayDeque in the JDK (a Queue) actually has an array of elements (which can be resized) and head and tail fields. So it *is* a suitable implementation, but it doesn't expose elements in a random access way - it only exposes the head and tail. s the head and tail of the – Calum Nov 05 '09 at 18:46
  • A possible approach with LinkedList -- it is a Deque, but as a List it can be shuffled with Collections.shuffle() -- so depending on the needs for random access, one could perform a single shuffle operation and then pop / remove / iterate as many random items as needed. https://stackoverflow.com/a/19259094/7207622 – JeremyDouglass Mar 19 '20 at 18:27
1

Just throwing this out there as an alternative to rolling your own, so please take with a grain of salt: Depending on how frequently you need the random access get(i) and what performance you need from it (and how big your queue size will generally be), you could always use ArrayDeque.toArray()[i] when you need to access an element. The toArray() uses System.arraycopy() under the covers which should be pretty fast for small queue sizes and occasional usage. Would help to understand why you need random access to a queue and how often it is needed -- possibly there's a different way to implement your algorithm without it.

Chris B.
  • 2,532
  • 1
  • 15
  • 9
1

If it truly must be unbounded, then something like ConcurrentSkipListMap may be useful, if you assign an incrementing sequence to each event to use as the key in the map. It provided methods such as pollFirst/LastEntry. If you can sacrifice the unbounded nature of it, then a ring buffer maybe what you need.

Derek Lewis
  • 1,067
  • 1
  • 10
  • 16
0

A binomial heap can have O(1) amortized insert and O(log n) amortized delete min; I believe that it can also have O(log**2 n) amortized random access. Queue-push would insert the element into the heap, with successive integers as keys.

With a rbtree, you can do queue-push with O(log n) pessimistic for all of insert, delete min and random access. That is because the tree will have a contiguos set of integers as the keys, and the k-th element of the queue will be the element in the tree with the k-th key.

Adrian Panasiuk
  • 7,249
  • 5
  • 33
  • 54