13

I am building an application which will require a collection to hold about 10k of Strings.

Collection will be used as queue.

So was looking through different collection types in C# but could not figure out which one has best performance in regards to speed of doing Put and Get operation in Queue. Also should be capable of not allowing duplicates in the Queue/Collection.

EDIT based on the comments..

Any existing collection will be helpful. Or a custom collection which could out perform any existing collection will be great.

Thanks

Kamil Dhuleshia
  • 317
  • 2
  • 3
  • 12
  • 2
    how about using an array as a fifo? – eat_a_lemon Mar 26 '11 at 06:44
  • thought about the ArrayList but they perform very bad on searching vs Dictionary which are very good when performing search but they require lot more resources and time doing put and get.... – Kamil Dhuleshia Mar 26 '11 at 06:49
  • 1
    If there would be one fastest collection, all others would be useless :) Please tell us if you need one that is fast to insert new items to, or one that is fast to read from (if you only build it once and only read from it, that makes a huge difference). Also, is memory usage a problem? How long are the strings? – Michael Stum Mar 26 '11 at 07:01
  • 1
    It is a nonsensical question. It suggests that there is something wrong with Queue<> but never says what. If there was a better way to implement a queue then of course the .NET framework programmers would have used it. You can't do a better job, only worse. – Hans Passant Mar 26 '11 at 07:24

3 Answers3

17

If you are looking for High performance Put & Get while checking for uniqueness (duplicate checking) but order doesnt matter (not a queue) then use HashSet<T>

If Queue feature is more important then use a Queue<T>

I dont think there is anything which offer both.

Sanjeevakumar Hiremath
  • 10,985
  • 3
  • 41
  • 46
  • He was looking for a fast solution to having a queue system with unique entries. You provided solution to one of those conditions but not both at once. – Kasper Holdum Mar 26 '11 at 07:00
  • 1
    I said it is not possible at once with any one datastructure. isn't it the case? – Sanjeevakumar Hiremath Mar 26 '11 at 07:07
  • 2
    +1 Based on the way the question is stated, this is a correct answer. The question appears to be looking for an existing collection type. It does not solve the _intent_ behind the original question, but we can't read his mind. – Joel Lee Mar 26 '11 at 07:07
  • Note the edit: "Any custom data type which outperform existing solutions". The solution I posted solves all requirements while providing great performance. – Kasper Holdum Mar 26 '11 at 07:17
7

Do you mind spending O(2n) memory? You could use a Queue<> in combination with a Dictionary<,>. The queue would handle the queue and dequeue operations and the dictionary would ensure unique entries. A simple wrapper class could combine those two, and it would give you O(log n) queue and dequeue times.

Example:

public class SetQueue<T>
{
    private readonly Dictionary<T, bool> duplicates = new Dictionary<T, bool>();
    private readonly Queue<T> queue = new Queue<T>();

    public bool Enqueue(T item)
    {
        if (!duplicates.ContainsKey(item))
        {
            duplicates[item] = true;

            queue.Enqueue(item);

            return true;
        }

        return false;
    }

    public T Dequeue()
    {
        if (queue.Count >0)
        {
            var item = queue.Dequeue();
            if (!duplicates.ContainsKey(item))
                throw new InvalidOperationException("The dictionary should have contained an item");
            else
                duplicates.Remove(item);

            return item;
        }

        throw new InvalidOperationException("Can't dequeue on an empty queue.");
    }
}

An insert into this custom data structure check if the dictionary already contains the item. This operation uses the ContainsKey method which is a O(log n) operation. If the item was already contained in the data structure than the method exits. If the item isn't contained, then the item will be inserted into the queue which is a constant O(1) operation. It will also be added to the dictionary. When the count of the dictionary is less than the capacity this will approach a constant, O(1) insertion time as well. The total queue time will therefore be O(log n).

The same thing goes the dequeuing method.

This solution is basically the same as the built-in data structure OrderedDictionary, however, since this solution uses generic there is no overhead in boxing/unboxing in it's operations making it wastely faster.

Kasper Holdum
  • 12,993
  • 6
  • 45
  • 74
  • That is a possible solution.... Or I could do is have the Dictonary with all the collection data and use Queue as a buffer to sub set data from dictonary. ANY COMMENTS. – Kamil Dhuleshia Mar 26 '11 at 06:54
  • You could use a HashSet instead of a Dictionary<,>, not sure what your value would be in a dictionary if the string is the key. – BrandonAGr Mar 26 '11 at 06:55
  • I'm not sure what you mean by using the queue as a buffer to sub set data from the dictionary? – Kasper Holdum Mar 26 '11 at 06:59
  • @Qua Sorry to be unclear but use dictionary as a collection and then extract a group of data from dictionary and store it in queue to perform get operation per item. As queue are fast doing that and same when inserting values to dictionary. Collect all inputs into a queue and then insert a collection to dictionary. But I am not sure if dictionary will be able to do group inserts and extracts like ArrayList. – Kamil Dhuleshia Mar 26 '11 at 07:07
  • 1
    Iterating over all data stored in the dictionary is a slow operation. Is there any reason why you wouldn't keep the two synchronized at all times like I did the in the example code? I think you need to be more specific as to what you are really looking to do. Should the entries be unique, or should identical entries be grouped up? – Kasper Holdum Mar 26 '11 at 07:16
  • Fastest & Efficieant is not reflected in this answer two datastructures is **not efficient** by any measure. **@kdhuleshia** you've got to remove atleast `efficient` word from your question. – Sanjeevakumar Hiremath Mar 26 '11 at 07:34
  • Please tell me why this solution would not be efficient? Notice how this solution solves all of the requirements of the question while still providing great performance. O(log n) inserts, O(log n) removal. Doesn't get any better. – Kasper Holdum Mar 26 '11 at 07:39
  • Yeah, it is efficient. Efficient means providing a solution with minimum wasted effort. Try coming up with a solution that uses less space while providing better performance. – Kasper Holdum Mar 26 '11 at 07:41
  • My solution out performs the accepted answer by a magnitude and they both use the same amount of memory. – Kasper Holdum Mar 26 '11 at 07:53
  • @Qua - Edit answer to explain (briefly) how it outperforms the accepted answer, and I will upvote this. I'm not sure why the answer was downvoted. (Wasn't me. I don't have enough points.) – Joel Lee Mar 26 '11 at 07:59
  • I edited my entry. This data structure is exactly identical to OrderedDictionary. However, OD is an old data structure not taking advantage of generics and thus it takes a huge performance hit due to boxing/unboxing. – Kasper Holdum Mar 26 '11 at 08:09
  • @Sanjeevakumar Hiremath and @Qua: it wouldn't be 2n space. .NET's string interning means that you wouldn't be doubling memory use if you reuse strings between two collections. You'd have the overhead of twice as many pointers, but that's nowhere near the size of the original strings. – Dan Puzey Mar 26 '11 at 08:16
  • @Dan, The number of references in the collection would still be 2n right? if it was int or any other object or even strings get interned but references to them have to be added in the collection. BTW the **uniqueness constraint would nullify interning advantage**. :) – Sanjeevakumar Hiremath Mar 26 '11 at 08:22
  • How would the uniqueness nullify the interning advantage? With interning you'd store the string once and two references - which are typically much smaller in comparison to the original string. It's nowhere near 2n storage (assuming your strings are at least a few characters in length) - if it was, there'd be no need for interning! – Dan Puzey Mar 26 '11 at 09:55
  • In this question OP wants unique strings, interning is useful only when two references are pointing to one string, but duplicate detection in this specific question would'nt let you take advantage of interning. – Sanjeevakumar Hiremath Mar 28 '11 at 07:46
  • Well, it would probably be even more efficient to use a `HashSet<>` instead of the `Dictionary<>`. As to the boxing/unboxing: He is using strings so there is no boxing involved. I usually try to stick to framework data structures if possible - someone has already written the code and tested it - unless there is a proven advantage of rolling your own implementation. – ChrisWue Oct 14 '12 at 20:05
  • This will have issues in a multithreaded environment. I tried ConcurrentDictionary but then performance is reduced drastically – batwadi Mar 26 '22 at 17:06
7

There is the OrderedDictionary class which keeps the insertion order but allows you to look up values by key.

oɔɯǝɹ
  • 7,219
  • 7
  • 58
  • 69
ChrisWue
  • 18,612
  • 4
  • 58
  • 83