-6

I came to this question while pondering about the ordering of set, frozenset and dict. Python doesn't guarantee any ordering, and any ordering is coupled to the hash value at some level. But is the hash value for a value of a numeric or string built-in type standardized? In other words, would

hash((a,b,c,d,e,f,g))

have a determined value, if a, b, c, d, e, f, g are numeric values or str?

juanchopanza
  • 223,364
  • 34
  • 402
  • 480
  • 1
    What has the hash value to do with ordering???? –  Jul 01 '11 at 16:34
  • Not sure I got what you are asking. What do you mean by "standard hash"? – mac Jul 01 '11 at 16:37
  • 1
    @juanchopanza The hash value influences ordering, but it doesn't guarantee it. Dictionaries generally take the hash value and mod it by some fixed size. So if the hash table has 17 slots, the hash value 5 will occur *after* the hash value 18 (because `18 % 17` is 1). – Chris Eberle Jul 01 '11 at 16:39
  • @mac I mean hash(x) where x is the numeric value or string, and hash is the built-in hash function. – juanchopanza Jul 01 '11 at 16:39
  • Never ever rely on a dictionary for ordering. That's not its purpose. If you want reliable ordering, use an array. – Chris Eberle Jul 01 '11 at 16:40
  • @Chris sure, and collisions have to be dealt with and so on. But is this standardised, or something that can change with implementation? – juanchopanza Jul 01 '11 at 16:42
  • 2
    @Chris, I know that. My question is, while the ordering might be completely non-intuitive, will it be the same or not for the same inputs. – juanchopanza Jul 01 '11 at 16:44
  • @juanchopanza: I believe I have your answer [right here](http://stackoverflow.com/questions/3812429/is-pythons-set-stable/3812600#3812600) – Chris Eberle Jul 01 '11 at 17:23
  • @DKGasser but he makes a good point. We say, "works for me in MY version of CPython", but there ARE other versions of python out there. Someone should try your example in Jython or IronPython or PyPy and see what happens. – Chris Eberle Jul 01 '11 at 17:26
  • @juanchopanza: no, ordering is not guaranteed. See my answer. – Chris Eberle Jul 01 '11 at 17:33

5 Answers5

10

The hash values for strings and integers are absolutely not standardized. They could change with any new implementation of Python, including between 2.6.1 and 2.6.2, or between a Mac and a PC implementation of the same version, etc.

More importantly, though, stable hash values doesn't imply repeatable iteration order. You cannot depend on the ordering of values in a set, ever. Even within one process, two sets can be equal and not return their values in the same order. This can happen if one set has had many additions and deletions, but the other has not:

>>> a = set()
>>> for i in range(1000000): a.add(str(i))
...
>>> for i in range(6, 1000000): a.remove(str(i))
...
>>> b = set()
>>> for i in range(6): b.add(str(i))
...
>>> a == b
True
>>> list(a)
['1', '5', '2', '0', '3', '4']
>>> list(b)
['1', '0', '3', '2', '5', '4']
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
4

As proof that ordering is NOT preserved, consider the example by DKGasser. When run in CPython, this is the result:

>>> test = ['cat', 'dog', 'mouse', 'rat', 6126, 516]
>>> temp = []
>>> for x in set(test):
        temp.append(x)  
>>> temp
[516, 'dog', 6126, 'cat', 'rat', 'mouse']

When run in Jython, this is the result:

>>> test = ['cat', 'dog', 'mouse', 'rat', 6126, 516]
>>> temp = []
>>> for x in set(test):
        temp.append(x)  
>>> temp
[6126, 'dog', 'cat', 'rat', 516, 'mouse']

Q.E.D.

It is entirely dependent upon the interpreter's implementation, and not at all guaranteed by the language itself.

EDIT

Apologies for beating this into the ground, but the OP seems to want definitive "straight from the horse's mouth" proof that ordering cannot be guaranteed. I finally found it:

http://docs.python.org/library/stdtypes.html#dict

CPython implementation detail: Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.

So there you have it. Please let's be done with this now.

Chris Eberle
  • 47,994
  • 12
  • 82
  • 119
  • Who the hell downvoted this. Seriously. At least give some feedback. – Chris Eberle Jul 01 '11 at 17:00
  • This wasn't what he was asking, though it's true. –  Jul 01 '11 at 17:01
  • Not me, I don't downvote. But I don't care about whether your ordering will change when you add elements. I want to know whether it will always be the same for the same inputs. – juanchopanza Jul 01 '11 at 17:04
  • @DKGasser: actually it kinda does. He asked about ordering in dictionaries. When you iterate over a dictionary or a set, it iterates over the internal array, as outlined above. When you add new items that array can shift around and the ordering gets quite messed up. – Chris Eberle Jul 01 '11 at 17:04
  • @juanchopanza: ok, I see. That one's a definite "probably". See my edit. – Chris Eberle Jul 01 '11 at 17:08
  • It's a definite. I've done it numerous times. Though, never cross platform (OS), but I can't immediately see how that would affect it. –  Jul 01 '11 at 17:10
  • @DKGasser: I could see things like byte alignment affecting something like this (i.e. trying to keep the internal array sized to something optimal for the platform). Of course that's 100% speculation, I'm just saying it *could* affect it. – Chris Eberle Jul 01 '11 at 17:12
  • @DKGasser, maybe 32bit vs. 64 bit, big vs. little-endian *might* have an effect... unless it is standardised, which goes back to my question :-) – juanchopanza Jul 01 '11 at 17:13
  • @Chris I can't find any documentation online... if you've got python 2.7, run the same code I did in my answer and see if it iterates the same for you? That should verify it. (assuming you aren't on 32bit windows XP too. :P) –  Jul 01 '11 at 17:15
  • 1
    @juanchopanza: ok now your question is coming in to focus: "is the behavior of hashing and data types that depend on hashing consistent from one platform to the next?" – Chris Eberle Jul 01 '11 at 17:15
  • @DKGasser: Python 2.5 under 64-bit Debian linux gives the same answer. So does Python 3 under 64-bit Arch. So in terms of anecdotal evidence I'm inclined to agree. However the OP is right, without documentation it's just more "works for me!" which doesn't exactly hold up all the time. – Chris Eberle Jul 01 '11 at 17:20
  • @Chris @Juanchopanza Well we've proven it just now in three different architectures and three different versions of python.. and I've used this fact *numerous* times in programs and never had a problem. I think it's pretty safe to confirm now. –  Jul 01 '11 at 17:22
  • 1
    @DKGasser: see my edit. I like the answer: "There's no formal guarantee about the stability of sets (or dicts, for that matter.) However, in the CPython implementation, as long as nothing changes the set, the items will be produced in the same order." – Chris Eberle Jul 01 '11 at 17:25
  • 1
    @Chris Yes, concise way of saying what I think we've been getting at. –  Jul 01 '11 at 17:26
  • @Chris let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/1065/discussion-between-dkgasser-and-chris) –  Jul 01 '11 at 17:36
  • @Chris +1 good example. The obvious question is "is Jython python". In C++, I've come across cases where VS does something non-standard. In these cases you just say "VS is wrong", because you have a very detailed standard. I guess I'm looking for the python equivalent... – juanchopanza Jul 02 '11 at 08:09
  • @juanchopanza: I think you're chasing ghosts. It's OK for a language to have "gray areas" that are implementation dependent. It's OK for the designers of a language to say, "I really don't care what happens in this situation". Just google "C++ undefined behavior" for many examples of this. The standard is simply a framework, not a holy text. Both Jython and CPython are indeed following the language specs. – Chris Eberle Jul 02 '11 at 17:42
  • @juanchopanza: See [this question](http://stackoverflow.com/q/1094961/576139). There is no formal python specification at all, at least not for containers and such. So neither Jython nor CPython are following the spec at all because there IS no spec. So I believe the answer is that no, there is absolutely no formal guarantee nor remark of any kind that can allow you to be *certain* about ordering. – Chris Eberle Jul 03 '11 at 00:09
2

Speaking from the general idea of a hash set, you can't rely on the order. Even if the implementation you are using happens to preserve order, it's a bad idea to rely on that unless the documentation specifically says that you can.

The fact that the hash values for all objects being placed into the set are guaranteed to always be the same is irrelevant to whether or not the set implementation preserves order.

For a simple hash implementation, a common simple way to go about it is to create an array of size ORIGINAL_SIZE. When an item is inserted, it's hash value is generated and then mapped (via mod for simplicity) to a value range the size of the array, and then the object is placed at that spot in the array. If there's already an item at that spot (ie the array is smaller than the number of possible items), then some collision algorithm is used.

When the number of items in the set implementation changes, the underlying implementation may change the size of the array storing the data (ex, to ORIGINAL_SIZE * 1.5). When this happens, the order of items under iteration will very likely change. This generally only happens for inserts, but can happen for deletes, or even if the implementation spreads out such activities over other operations.

There are a number of set implementations in various languages that guarantee ordering, and some that guarantee that it will be the same order the items are inserted in and what happens to the order when you insert the same item twice (ie, does it move to the end, etc). However, unless the implementation you're looking at specifically says it guarantees that, you cannot rely on it.

As a specific case imagine that, on the next release of Python, it is determined that the underlying code for sets is inefficient. Somebody decides that they will rewrite it to make it much faster. Even if the old implementation happened to preserve order... if the documentation doesn't say it does then the new implementation is free to not have that property.

RHSeeger
  • 16,034
  • 7
  • 51
  • 41
-1

AFAIK, the result of __hash__() should always the unique for that object. In the case of integers, the hash is the value itself.

According to the documentation:

object.hash(self)

Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. hash() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

So the order of your objects will always depend on the particular implementation of the hash method for that object and whether it returns something that "makes sense" for comparison is completely determined by you, on custom objects.

TL;DR - Yes, the hash will determine the order of your objects. The order will of course depend on the results given by the hashes or those objects.

Community
  • 1
  • 1
João Neves
  • 937
  • 1
  • 6
  • 13
  • I know that, I've read the documentation. But the question is whether it is standard or not. Say I try the same thing on two architectures. – juanchopanza Jul 01 '11 at 16:46
  • 1
    Note: hash() doesn't promise to be unique, and the objects can be returned in a different order than their hashes would indicate. – Ned Batchelder Jul 01 '11 at 21:08
-2

The hash() function of python does a predefined set of operations to come up with its value. What those operations are is further explained here: A given object (string, integer, whatever) will always yield the same hash value.

When you put items into a set (or similar structure), they are rehashed whenever the size of the set reaches a certain threshold. Thus, while you may be unable to predict what order a certain set of items would be in, the same n items will always be in the same order in a set.

Thus, effectively yes... a,b,c,d,e,f,g, where each is a specific string or integer, would always appear in the same order when iterated through in a set. (though, not necessarily the order I just listed them).

EDIT: Edited for clarity based on comments.

EDIT: Console Proof

Ran under python 2.5 on Debian 32bit, python 3 on 64bit and 2.7 on Windows XP 32bit.. comes out the same in all of them, and I've used the fact in programs before with no problems.

Thanks to Chris for the additional platforms to confirm test.

>>> test = ['cat', 'dog', 'mouse', 'rat', 6126, 516]
>>> temp = []
>>> for x in set(test):
    temp.append(x)  
>>> temp
[516, 'dog', 6126, 'cat', 'rat', 'mouse']
>>> temp = []
>>> for x in set(test):
        temp.append(x)
>>> temp
[516, 'dog', 6126, 'cat', 'rat', 'mouse']
>>> 
Community
  • 1
  • 1
  • 1
    are you sure? can you provide a link to the documentation, please? :) – Ant Jul 01 '11 at 16:39
  • I did provide a link to how the fucntion works. click on 'here' in the above. And yes, I am sure! Here is the documentation from python.org: http://docs.python.org/library/functions.html#hash –  Jul 01 '11 at 16:41
  • Wrong. (a,b,c,d,e,f,g) will always give the same *hash*. – Chris Eberle Jul 01 '11 at 16:42
  • @Chris That's what I said... can you please explain your confusion/the downvote? –  Jul 01 '11 at 16:42
  • 1
    @DKGasser no it isn't. You said "Thus, efftively yes.... [they] would always appear in the same order". If you *hash* this tuple, information about ordering is lost. It's just a hash. – Chris Eberle Jul 01 '11 at 16:45
  • I didn't ask for the hash documentation..I asked a link that would support your affirmation that, in a set, strings will appear always in the same order – Ant Jul 01 '11 at 16:45
  • @Chris reread my answer... what you just said is exactly what I did. They would always appear in the same order, not necessarily THAT order though. –  Jul 01 '11 at 16:46
  • @Ant: they won't. Dictionaries and sets are constantly changing size, and every time the size changes, the items are re-hashed and the ordering changes. You CAN NOT rely on a dictionary / set for ordering. – Chris Eberle Jul 01 '11 at 16:47
  • @Ant Try it. Use some strings, hash them and put them in a set, then iterate through it... then delete everything and repeat with the same strings. It's a feature of how Hash() works. –  Jul 01 '11 at 16:47
  • @Chris Obviously if you are using a different set yes.. but a set of the same items will ALWAYS be in the same order.. if you have the same items and number of items. That is what the user was asking, as far as I could tell. –  Jul 01 '11 at 16:48
  • @DK Gasser well, I cannot just 'try and see'.. If I write code which depends on this feature, I have to know that I can really rely on it or not (that same inputs produce the same output, even if the order can be non-intuitive) So please provide a link, be more specific; you're just saying 'trust me' :) – Ant Jul 01 '11 at 16:51
  • @DKGasser, @Chris, yes, this is what I was asking. – juanchopanza Jul 01 '11 at 16:52
  • @Ant Personal experience. It's a qualified reference according to the FAQ, as far as I know. I'll look for website documentation if it makes you feel better, though. *starts looking* –  Jul 01 '11 at 16:53
  • I tried it in Jython, and the results are: `[6126, 'dog', 'cat', 'rat', 516, 'mouse']` Ordering is NOT guaranteed. – Chris Eberle Jul 01 '11 at 17:29
  • I think that's irrelevant really. Jython is, effectively, a different language. –  Jul 01 '11 at 17:32
  • I would write just `list(set(test))` instead of the loop. – Gareth Rees Jul 01 '11 at 17:49
  • @Gareth Yes, I could have... I just wanted to explicitly prove that I was deleting/changing the instance though. ('temp') –  Jul 01 '11 at 17:50
  • Jython is a different implementation of the same language. – Ned Batchelder Jul 01 '11 at 18:28
  • "A given object (string, integer, whatever) will always yield the same hash value": only within a given implementation, any change in the underlying implementation could change the hash function. – Ned Batchelder Jul 01 '11 at 21:19
  • 4
    "while you may be unable to predict what order a certain set of items would be in, the same n items will always be in the same order in a set": Not true, see my answer. BTW: It's impossible to prove something is always true with a console test. – Ned Batchelder Jul 01 '11 at 21:20