1

Currently, I have a LinkedList which stores a custom Node class. The Nodes are currently removed in order and evaluated, which generally adds more Nodes back into the LinkedList, treating it like a Queue.

But in reality I don't care about maintaining the order of the Nodes because the order they are being added or removed doesn't matter. You can remove the 1st, 54th, or 1032nd Node from the List, it doesn't matter. All that matters is the Nodes are being processed quickly, which means one is removed (at random), mutated, then added back along with several variations of it (once again the order doesn't matter).

Since I haven't been able to find a Java Bag implementation, what is the most efficient way to maintain this type of collection ?

PS Out of laziness I have avoid using arrays because the collection of nodes could theoretically range from 1 Node to 3^64 Nodes in size, though it's more likely to stay under a million.

nquincampoix
  • 508
  • 1
  • 4
  • 17
Jeremy
  • 3,620
  • 9
  • 43
  • 75
  • 1
    I would think that `LinkedList` will start behaving weirdly if you store a number of items that is greater than `Integer.MAX_VALUE`. – assylias May 08 '13 at 02:58
  • @assylias Luckily, the number of nodes generally is less than 1 million, but technically if enough memory this program "should" be able to store more. – Jeremy May 08 '13 at 02:59
  • How many gigs is `3^64`? – Miserable Variable May 08 '13 at 02:59
  • Three million million million million million (3 x 10^30) items is going to be a stretch :-) – paxdiablo May 08 '13 at 03:00
  • 2
    Dude, if you have to worry about keeping track of up to 3^64 nodes, you have bigger fish to fry than which collection type to use. At that size, you're going to have to worry about how to map and unmap memory from your address space, because you won't be able to fit everything at once. – dlev May 08 '13 at 03:00
  • indeed, and at about 32+ bytes per Node instance, plus at least 32 bytes for the actual data, you are looking at a system with at least .... > 190000000000000 ZETABYTES of memory – rolfl May 08 '13 at 03:01
  • `3^64` at 1 byte per node is (roughly) `2.9 x 10^12` exabytes. You aren't going to be storing that many nodes, ever. – Yuushi May 08 '13 at 03:03
  • Oh, and as a bonus, if you used an array instead of a Linked List, you could save yourself about 3Trillion dollars in your memory cost .... by reducing the Node memory footprint to just 8 bytes instead of 32 – rolfl May 08 '13 at 03:03
  • @Yushi, that smacks of comments like "640K should be enough for anyone" (ascribed to BillyG but denied) and "the worldwide market for computers will be about 5" (ascribed to Tom J Watson Junior, IBM). "Never" is an awfully long time :-) – paxdiablo May 08 '13 at 03:04
  • How do you find your Nodes? If you don't need to find them somehow then any of the List objects should work just fine. If you don't do insertions and removals, but just work on the ends, probably ArrayList is the most efficient. – Hot Licks May 08 '13 at 03:05
  • @HotLicks Currently the Nodes are removed from a LinkedList using pop() BUT the order which they are added or removed doesn't matter, you can remove the 54th element then the very next time remove the 1000th element, the same with adding. All that matters is the Nodes are being processed quickly (removed, evaluated, add new nodes (mutations from the original). – Jeremy May 08 '13 at 03:25
  • Yeah, most efficient, in terms of storage and time, is probably ArrayList, if you always add/remove from the ends. – Hot Licks May 08 '13 at 03:31
  • A the most efficient way to do some thing is often to do nothing at all. You get faster remove-and-add if you just leave the node in the collection and update it. Instead of changing the order, you could cycle the entries but iterating through them. While 3^64 would require an unrealistic amount of memory it would also take a million trillion years to create once at a rate of one hundred million per second. – Peter Lawrey May 08 '13 at 05:40

1 Answers1

1

The Java HashSet or TreeSet types might be good here, since they represent unordered collections of elements that support quick insertion and deletion of elements. That said, you can't possibly hold 364 values in memory, since that's appromately 3.4336838 × 1030, a number vastly bigger than any amount of RAM that I know of can hold.

EDIT: Based on the described use case (support efficient insertion and removal of random elements), you might want to adopt the approach described in this older question for building a data structure that does just that. Intuitively, you would use an ArrayList, then remove elements by swapping them to the end of the ArrayList and removing them. This gives O(1) insertion and O(1) removal with extremely low overhead.

Hope this helps!

Community
  • 1
  • 1
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • The Set classes have overhead to make them searchable. The OP has not specified how/if he finds the objects he's interested in. Does he need the search capability? – Hot Licks May 08 '13 at 03:08
  • (I would think he couldn't have more than `2^64` elements, much less `3^64`.) – Hot Licks May 08 '13 at 03:11
  • @HotLicks- Any collection supporting insertion and deletion has to have some support for searching. If the OP is inserting and deleting things a lot, they will either have to pay in time (linear search in a `LinkedList`) or space (a slight overhead in a `HashSet` or `TreeSet`). That said, I don't believe there's much overhead in these structures, since they're very easy to traverse. Are you sure that (a) the overhead exists and (b) it is significant enough to warrant not using them? – templatetypedef May 08 '13 at 03:11
  • @templatetypedef - He implies he just adds/removes at the ends. But he's pretty vague. – Hot Licks May 08 '13 at 03:12
  • (I know the overhead exists. "Significant" or not would imply that the OP had actually told us anything about what he'd doing.) – Hot Licks May 08 '13 at 03:16
  • @templatetypedef All of the nodes in the collection are added/removed at random, and the order by which they are added/removed has no impact as long as it is not ALWAYS the last element. – Jeremy May 08 '13 at 03:20
  • @JeremyQuick- Ah, I see. If that's the case, you might find the data structure suggested in this older question useful (http://stackoverflow.com/questions/4565006/data-structure-for-choosing-random-elements). – templatetypedef May 08 '13 at 03:23
  • amd64/x64 processors can only handle 48-bits of virtual memory or 256 TB. – Peter Lawrey May 08 '13 at 05:35