I hope that this question is specific enough to be deemed fit for StackOverflow. I checked the FAQ and I think this qualifies, since it is specific and related to programming.
I'm implementing a complex data mining algorithm (FP-growth) in Java. Some of the initial phases of the algorithm require me to scan a large database and keep a running count of each item type found. This seems perfectly suited to a Hashbag
interface. I found one in Apache Commons which seems to work for me.
So now, my HashBag is filled with [itemType, count] entries (pairs). Later on in the algorithm, I'm required to do a lot of list-like operations on these pairs. In some cases, I must sort the collection by itemType. In others, I must sort by count. This seems perfectly suited to a List
interface.
I'm left with the conclusion that I must convert my Hasbag to a List. Yet it feels dirty somehow, like a waste of space and time. Is there a smarter way to do this, or is it a common situation to have a programming problem where you must treat your collection differently at different times, and conversions are a necessary evil?
One alternative is to make my own interface which is truly a list, but allows "bag-style" adds. I'd have to keep the list sorted and perform binary searches with a custom comparator every time I wanted to add something. Building that collection would probably take longer than building a Hashbag, but I'd save on the conversion step at the end. Any thoughts as to which is preferable?
Thanks!