2

I am working on a problem where i'm required to store elements with requirements of No Duplication and Maintaining order. I chose to go with LinkedHashSet Since it fulfilled both my requirements.

Let's say I have this code:

 LinkedHashSet hs = new LinkedHashSet();
  hs.add("B");
  hs.add("A");
  hs.add("D");
  hs.add("E");
  hs.add("C");
  hs.add("F");
  if(hs.contains("D")){
       //do something to remove elements added after"D" i-e remove "E", "C" and "F"
       //maybe hs.removeAll(Collection<?>c) ??
   }

Can anyone please guide me with the logic to remove these elements?

Am I using the wrong datastructure? If so, then what would be a better alternative?

M. Justin
  • 14,487
  • 7
  • 91
  • 130
Jazib
  • 1,200
  • 1
  • 16
  • 39

5 Answers5

3

I think you may need to use an iterator to do the removal if you are using a LinkedHashSet. That is to say find the element, then keep removing until you get to the tail. This will be O(n), but even if you wrote your own LinkedHashSet (with a doubly linked list and hashset) you would have access to the raw linking structure so that you could cut the linked list in O(1), but you would still need to remove all elements that you just cut from the linked list from the HashSet which is where the O(n) cost would arise again.

So in summary, remove the element, then keep an iterator to that element and continue to walk down removing elements until you get to the end. I'm not sure if LinkedHashSet exposes the required calls, but you can probably figure that out.

Anil Vaitla
  • 2,958
  • 22
  • 31
  • +1 - The analysis is spot on. The elements have to be removed from the hash table individually, and that makes this O(N) ... no matter how you deal with the "linking" or "ordering" requirement. – Stephen C Apr 08 '13 at 22:55
  • Actually I have no issue with the O(n) as my data is not that large to worry about it. The real problem I'm facing is that LinkedHashSet does not implements get(index) function. Neither does it tells me which is the last element. So, I can't really traverse the list as u told. – Jazib Apr 09 '13 at 16:08
  • It may just be easiest to write your own class which wraps the hashset and linkedlist data structure, so that you can expose exactly what methods you want to use on the underlying data structures. – Anil Vaitla Apr 10 '13 at 14:24
0

The basic problem here is that you have to maintain two data structures, a "map" one representing the key / value mapping, and a "list" other representing the insertion order.

There are "map" and "list" organizations that offer fast removal of a elements after a given point; e.g. ordered trees of various kinds and both array and chain-based lists (modulo the cost of locating the point.)

However, it seems impossible to remove N elements from the two data structures in better than O(N). You have to visit all of the elements being removed to remove them from the 2nd data structure. (In fact, I suspect one could prove this mathematically ...)

In short, there is no data structure that has better complexity than what you are currently using.

The area where it is possible to improve performance (with a custom collection class!) is in avoiding an explicit use of an iterator. Using an iterator and the standard iterator API, the cost is O(N) on the total number of elements in the data structure. You could make this O(N) on the number of elements removed ... if the hash entry nodes also had next/prev links for the sequence.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • In this example there is no "map", just values. That said, I agree with your analysis that this is likely to be O(N) no matter what... – user949300 Apr 08 '13 at 23:41
  • @user949300 - True ... sort of. However, there is conceptually a second data structure. And in the standard HashSet implementation, the set is implemented using HashMap. See http://www.docjar.com/html/api/java/util/HashSet.java.html ... line 102. – Stephen C Apr 09 '13 at 03:51
0

You could write your own version of an ArrayList that doesn't allow for duplicates, by overriding add() and addAll(). To my knowledge, there is no "common" 3rd party version of such, which has always surprised me. Anybody know of one?

Then the remove code is pretty simple (no need to use an ListIterator)

int idx = this.indexOf("D");
if (idx >= 0) {
  for (int goInReverse = this.size()-1; goInReverse > idx; goInReverse--)
    this.remove(goInReverse);
}

However, this is still O(N), cause you loop through every element of the List.

user949300
  • 15,364
  • 7
  • 35
  • 66
  • you mean to say I should write my own logic to stop duplicating? I can do that, but from learning perspective can u please elaborate in which other scenario one uses the LinkedHashSet if not this? – Jazib Apr 09 '13 at 16:11
  • Not necessarily should, but could - an ArrayList can be more efficient depending on what is happening. LinkedHashSet is useful if you need to maintain the order in which items were added. I usually use HashSet (don't care about order) or TreeSet (use the natural order, e.g. alphabetical of the elements). – user949300 Apr 09 '13 at 19:01
0

So, after trying a couple of things mentioned above, I chose to implement a different Data structure. Since I did not have any issue with the O(n) for this problem (as my data is very small)

I used Graphs, this library came in really handy: http://jgrapht.org/

What I am doing is adding all elements as vertices to a DirectedGraph also creating edges between them (edges helped me solve another non-related problem as well). And when it's time to remove the elements I use a recursive function with the following pseudo code:

removeElements(element) {

tempEdge = graph.getOutgoingEdgeFrom(element)
if(tempEdge !=null)
   return;
tempVertex = graph.getTargetVertex(tempEdge)
removeElements(tempVertex)
graph.remove(tempVertex)

}

I agree that graph DS is not good for these kind of problems, but under my conditions, this works perfectly... Cheers!

Jazib
  • 1,200
  • 1
  • 16
  • 39
0

The last element can be retrieved or removed using the getLast() and removeLast() methods which are being added to LinkedHashSet in Java 21 as part of the sequenced collections enhancement. This can be combined with a while loop to remove elements from the end of the set until the desired element is encountered.

if (hs.contains("D")) {
    while (!"D".equals(hs.getLast())) {
        hs.removeLast();
    }
}
M. Justin
  • 14,487
  • 7
  • 91
  • 130