In python, how do I make a collection that is sorted by one value, and indexable by another

Question

I need to have a collection, in which I insert items such as [1,'b42b00d6-76c8-4d68-b22e-ff4653bb01c8'].

It needs to be ordered by the first element, but indexable by the second.

The following is the best I could come up with. It has two flaws:

It can't take multiple items with the same key, since it's a dictionary.
It can't properly delete items from the list.

My attempt:

from rbtree import rbtree

class Item(object):
    def __init__(self, value, id):
         self.value = value
         self.id = id

item1 = Item(1,'b42b00d6-76c8-4d68-b22e-ff4653bb01c8')
item2 = Item(2,'60eda62f-f05d-4134-9e92-9bb9a1f52daf')
item3 = Item(2,'77d9a028-bd4b-4634-b230-234f88ff010a')
item4 = Item(3,'7e7118cd-7145-41c8-8413-79670bdc81dc')

myList = rbtree()
myList[item2.value] = item2
myList[item1.value] = item1
myList[item3.value] = item3
myList[item4.value] = item4

# Correctly ordered by the first element
# But it's missing item2.

for k,v in myList.iteritems():
    print "%s %s" % (v.value, v.id)

# But I also need to index by the second element.
# So:

listIndexedBySecondElement = {}
listIndexedBySecondElement[item1.id] = item1
listIndexedBySecondElement[item2.id] = item2
listIndexedBySecondElement[item3.id] = item3
listIndexedBySecondElement[item4.id] = item4

item = listIndexedBySecondElement['7e7118cd-7145-41c8-8413-79670bdc81dc']
print item.value # correctly prints 3

# Now I need to delete an element.

del listIndexedBySecondElement['b42b00d6-76c8-4d68-b22e-ff4653bb01c8']
# But I also need to delete it from myList. How do I do that?

When say "ordered by", what do you mean? Do you just mean you want it to *display* with that order, or is there something you actually want to *do* with that order? — BrenBarn, Mar 14 '15 at 04:05
I need to do some analysis of the data, which requires it to be in order. So it should be sorted. The analysis happens every single time an item is added or deleted. — abtree, Mar 14 '15 at 04:12
Sorry, but that still doesn't answer my question. What does the analysis *do* with the data that requires it to be ordered? Does it, for instance, iterate over it? Can the ordering be done as part of the analysis rather than being encoded in the data structure itself? — BrenBarn, Mar 14 '15 at 04:14
It adds up the values, starting from the lowest, until the sum is some value (say 10), then returns that value. It generally has to go through at least 20 items before it reaches the sum needed. — abtree, Mar 14 '15 at 04:22
https://pypi.python.org/pypi/rbtree - It's a red-black tree. Maintains an ordered list. — abtree, Mar 14 '15 at 08:41
It appears `value` is not unique, so you can't use it as the key to anything `dict`-like. What guarantees _can_ you make about it? For instance, your sample data has the `value` arriving in non-decreasing order. If that's guaranteed to happen, you could store it in a [`collections.OrderedDict`](https://docs.python.org/3/library/collections.html#collections.OrderedDict). Also, is `id` guaranteed to be unique, or do you have to worry about duplicates there too? — Kevin J. Chase, Mar 14 '15 at 18:05

score 1 · Answer 1 · answered Mar 14 '15 at 04:05

Before you run:

del listIndexedBySecondElement['b42b00d6-76c8-4d68-b22e-ff4653bb01c8']

grab the item:

itm = listIndexedBySecondElement['b42b00d6-76c8-4d68-b22e-ff4653bb01c8']

now you can delete it from both:

del listIndexedBySecondElement['b42b00d6-76c8-4d68-b22e-ff4653bb01c8']
del myList[itm.value]

As for the "order" part - dictionary is not an ordered data-structure, for that you'll have to implement something else.

score 0 · Answer 2 · answered Mar 14 '15 at 04:15

0

You could instead have a dictionary of id to value:

mydict = {}
mydict['77d9a028-bd4b-4634-b230-234f88ff010a'] = 2
mydict['b42b00d6-76c8-4d68-b22e-ff4653bb01c8'] = 1
mydict['7e7118cd-7145-41c8-8413-79670bdc81dc'] = 3
mydict['60eda62f-f05d-4134-9e92-9bb9a1f52daf'] = 2

The dictionary can now be indexed by the id. You can sort and print by value like this:

sorted_dict = sorted(mydict.items(), key=lambda x:x[1])
for id, value in sorted_dict:
    print("{0} {1}".format(id, value))

Printing:

b42b00d6-76c8-4d68-b22e-ff4653bb01c8 1
77d9a028-bd4b-4634-b230-234f88ff010a 2
60eda62f-f05d-4134-9e92-9bb9a1f52daf 2
7e7118cd-7145-41c8-8413-79670bdc81dc 3

Which is ordered by value.

answered Mar 14 '15 at 04:15

Zach Gates

4,045
1
27
51

One issue with that method is that I'd need to sort it after every time I insert, which would be slow I think. Also I need to store multiple entries with the same key. – abtree Mar 14 '15 at 04:27
@abtree: If you need to store multiple items with the same key, what do you want to happen when you try to index by that key? – BrenBarn Mar 14 '15 at 04:30
As alfasin said above, dictionaries aren't *meant* to have a definite order. @abtree – Zach Gates Mar 14 '15 at 04:32
Why would two items have the same id? @BrenBarn – Zach Gates Mar 14 '15 at 04:32
Sorry, I misunderstood. There won't be multiple entries with keys such as '77d9a028-bd4b-4634-b230-234f88ff010a'. – abtree Mar 14 '15 at 04:33
Also, depending on the expected length of these dictionaries, having a single dictionary, and sorting once when printing will save memory. – Zach Gates Mar 14 '15 at 04:35
It will need to be sorted every time items are added or removed, because I need to do some computation on the sorted list. So I'm concerned calling sorted() every time would be slower than an rbtree. – abtree Mar 14 '15 at 04:38
You can use `timeit.Timer` to test the speed, but using `sorted` should be faster than storing two, potentially large, dictionaries. You could also use `for id, value in sorted(mydict.items(), key=lambda x:x[1])` instead of storing the sorted list, making it even more efficient. – Zach Gates Mar 14 '15 at 04:47

score 0 · Accepted Answer · answered Mar 15 '15 at 04:28

My final solution was a combination of using alfasin's answer, and switching from rbtree to pyavl. Pyavl is a set instead of a dictionary, so it can have duplicates.

Code:

import avl

class Item(object):
    def __init__(self, value, id):
        self.value = value
        self.id = id

item1 = Item(1,'b42b00d6-76c8-4d68-b22e-ff4653bb01c8')
item2 = Item(2,'60eda62f-f05d-4134-9e92-9bb9a1f52daf')
item3 = Item(2,'77d9a028-bd4b-4634-b230-234f88ff010a')
item4 = Item(3,'7e7118cd-7145-41c8-8413-79670bdc81dc')

myList = avl.new()
myList.insert(item2)
myList.insert(item1)
myList.insert(item3)
myList.insert(item4)

# Correctly ordered by the first element
for item in myList:
    print "%s %s" % (item.value, item.id)

# But I also need to index by the second element.
# So:

listIndexedBySecondElement = {}
listIndexedBySecondElement[item1.id] = item1
listIndexedBySecondElement[item2.id] = item2
listIndexedBySecondElement[item3.id] = item3
listIndexedBySecondElement[item4.id] = item4

item = listIndexedBySecondElement['7e7118cd-7145-41c8-8413-79670bdc81dc']
print item.value # correctly prints 3

# Now I need to delete an element.

itm = listIndexedBySecondElement['60eda62f-f05d-4134-9e92-9bb9a1f52daf']
del listIndexedBySecondElement['60eda62f-f05d-4134-9e92-9bb9a1f52daf']
myList.remove(itm)

In python, how do I make a collection that is sorted by one value, and indexable by another

3 Answers3