1

I would like to implement data structure which is able to make fast insertion and keeping data sorted, without duplicates, after every insert.

I thought about binomial heap, but what I understood about that structure is that it can't tell during insertion that particular element is yet in heap. On the another hand there is AVL tree, which fits perfectly for my case, but honestly there are rather too hard for implement for me, at that moment.

So my question is: is there any possiblity to edit binomial heap insertion algorithm to skip duplicates? Maybe anyoune could suggest another structure?

Grettings :)

Diptendu
  • 2,120
  • 1
  • 15
  • 28
Michocio
  • 503
  • 3
  • 19
  • How many elements? If the number of elements is not massively huge, I would first try an array. Its excellent cache locality properties make it a good default choice. – The Paramagnetic Croissant Jun 27 '15 at 12:24
  • Number of elements could be huge, so array isn't rather good choice. – Michocio Jun 27 '15 at 12:26
  • 2
    While `insertion` and `without duplicates` do restrict the choices that suggest themselves, I recommend to explicitly compile a list of operations to be supported, including requirements on resource consumption. (What about `remove, min, n`th (according to order), `average, count` …?) – greybeard Jun 27 '15 at 16:44

5 Answers5

2

In C++, there is std::set. it is internally an implementation of red black tree. So it will sort when you enter data.You can have a look into that for a reference.

Steephen
  • 14,645
  • 7
  • 40
  • 47
  • Thanks for interest. I am obligated to use only c libs, and to not use external data structers. – Michocio Jun 27 '15 at 12:29
  • @Michocio I was just point out a reference for you, not asking to use std::set – Steephen Jun 27 '15 at 12:30
  • @Steephen Even if a ready-made set implementation is forbidden, you may implement it yourself. A set includes only unique items by definition, making a binary tree an excellent choice for the underlying data structure. It would be a self-balancing (e.g. red-black or AVL) for speed, but the balancing could be done after you have the basic implementation already completed. The only difference between your insertion algorithm and that of an ordinary binary tree would then be that you must check to ensure that the item does not already exist before you insert it because each item must be unique. –  Jun 27 '15 at 14:05
1

A good data structure for this is the red-black tree, which is O(log(n)) for insertion. You said you would like to implement a data structure that does this. A good explanation of how to implement that is given here, as well as an open source usable library.

abligh
  • 24,573
  • 4
  • 47
  • 84
0

Skip lists are also a possibility if you are concerned with thread safety. Balanced binary search trees will perform more poorly than a skip list in such a case as skip lists require no rebalancing, and skip lists are also inherently sorted like a BST. There is a disadvantage in the amount of memory required (since multiple linked lists are technically used), but theoretically speaking it's a good fit.

You can read more about skip lists in this tutorial.


If you have a truly large number of elements, you might also consider just using a doubly-linked list and sorting the list after all items are inserted. This has the benefit of ease of implementation and insertion time.

You would then need to implement a sorting algorithm. A selection sort or insertion sort would be slower but easier to implement than a mergesort, heapsort, or quicksort algorithm. On the other hand, the latter three are not terribly difficult to implement either. The only thing to be careful about is that you don't overflow the stack since those algorithms are typically implemented using recursion. You could create your own stack implementation (not difficult) and implement them iteratively, pushing and popping values onto your stack as necessary. See Iterative quicksort for an example of what I'm referring to.

0

If you're okay using a library you may take a look at libavl Here The library implements some other varieties of binary trees as well.

DebD
  • 373
  • 3
  • 20
-1

if you looking for fast insertion and easy implemantaion why not linked list (single or double). insertion : push head/ push tail - O(1) remove: pop head/pop tail - O(1) the only BUT is "find" will be in O(n)

Sheanan
  • 23
  • 6
  • You're right about the speed and ease of implementation, generally speaking. However, the list must be kept sorted, so inserting `3` into the list `1,2,4` would make the list `1,2,3,4` (`3,1,2,4` and `1,2,4,3` would not be acceptable.) –  Jun 27 '15 at 13:26
  • so... how about stack ? **you have to know the capacity from start** but you can make a a sorted insertion every time so in fact it's kept sorted. also easy to impement. – Sheanan Jun 27 '15 at 13:36
  • Thanks a lot for answer! Just after reading your post, an idea came up to my mind. Is there any possibity to make binserach on linked list? I know that in general that is made for arrays, but maybe there is any hacking hint in c, and that is possible to implement bin serach on linked (even double-linked) list? – Michocio Jun 27 '15 at 13:46
  • maybe only if using the same algorithm as in heap based upon an array. if head is place = 0, then left child in 2*i+1, and right child in 2*i. you will have to make the calaculation of how many "next-jumps" to make – Sheanan Jun 27 '15 at 14:06
  • @Sheanan A stack can only insert and remove from one side (unless you make it double-ended like you mentioned in your answer). *Sorted insertion is a requirement here.* Like I said, `1,2,3,4` would be the only acceptable sequence after inserting `3` into the sequence `1,2,4`. –  Jun 27 '15 at 14:11
  • @ChronoKitsune I meant that if he knows the capacity then sorted insert in stack is relatively easier to implement and you will always get 1,2,3,4 (or 4,3,2,1 depend on sort type) .obviously it has also disadvantages – Sheanan Jun 27 '15 at 14:20
  • @Sheanan I'm unable to understand what you mean here. You can't insert an item `X` in the set `A , B , C` with a stack if `A < X < B` because a stack can only insert at the head and/or the tail. A linked list with a known capacity would work better, but then there's the disadvantage of the insertion time, which slows down as the list grows when the list must be remain sorted. –  Jun 27 '15 at 14:28
  • @ChronoKitsune as I said about the one side stack. if we have only push, pop and peek (at the last element) then insert can be : ' void r_insert(stack* st, int n) { if (stack_is empty) push (st,n) ; else if (n < peek(st)) push(st, n); else { int temp = pop(st); r_insert(st,n); push(st,temp); } }' that way you will always have sorted stack. again with obvious disadvantage – Sheanan Jun 27 '15 at 14:55
  • @Sheanan Oh, I see what you're thinking now. So you're suggesting popping possibly 4000 items, calling the function recursively with the new stack and the same item to insert that item, then pushing the popped items back onto the stack as they originally existed. Insertion will take forever on a large number of items, but it's definitely easy to implement! :-P –  Jun 27 '15 at 15:06
  • 1
    as said ... with the very obvious disadvantage ;-P – Sheanan Jun 27 '15 at 15:12