13

I just ran across something interesting that I thought I'd ask about.

Adding a dictionary to a set, I had assumed that the dictionary would be added as a full dictionary, it isn't. Only the keys are added:

dicty = {"Key1": "Val1", "Key2": "Val2"}
setunion = set()
setunion.union(dicty)
=> set(['Key2', 'Key1'])

When you attempt to add it using set.add() you get an error:

setadd = set()
setadd.add(dicty)
Traceback (most recent call last):
  File "python", line 1, in <module>
TypeError: unhashable type: 'dict'

Obviously, this behaviour is very different from lists:

   listy = []
   listy.append(dicty)
   listy
=> [{'Key2': 'Val2', 'Key1': 'Val1'}]

In the docs it says that sets are unordered collections of hashable objects, which is a hint to some of the issues above.

Questions

What's going on here? Set items have to be hashable, so clearly that has to do with why I'm only adding the keys to the set with .union(), but why the error with .add()?

Is there some usability reason behind the difference in behavior of sets from lists?

Is there a datatype in Python (or a library) that essentially functions like a list, but only keeps unique items?

Community
  • 1
  • 1
NotAnAmbiTurner
  • 2,553
  • 2
  • 21
  • 44
  • 2
    So you want a structure that preserves order but doesn't allow duplicates? – senshin Dec 04 '15 at 21:46
  • 3
    Why did you think `union` was the way to add things to sets? `union` means "construct a new set, adding the *elements* of the argument to the new set". It's like `extend` for lists, but not in place. – user2357112 Dec 04 '15 at 21:47
  • @senshin I didn't even think about the ordering issue. I'm more wondering about any structure that allows addition of any object, but doesn't allow (or 'absorbs') duplicates. – NotAnAmbiTurner Dec 05 '15 at 01:02
  • @NotAnAmbiTurner: That's not the only difference. There's no real make-a-new-set version of `add`. The mutative version of `union` is `union_update`. – user2357112 Dec 05 '15 at 01:04
  • @user2357112 Ohhhhhh. Makes sense. I'm learning python, and I was messing around on hackerrank, which is really good in terms of new concepts, but maybe not the best in avoiding confusion. – NotAnAmbiTurner Dec 05 '15 at 01:08
  • @user2357112 But... sets clearly do *not* allow the addition of _any_ object. They only take hashables, for one thing, and to get dicts in and out of a set you'd have to do some pretty funky stuff. – NotAnAmbiTurner Dec 05 '15 at 01:19

2 Answers2

21

No that's impossible by definition. The way hash tables (like dicts and sets) do lookups is fundamentally unique from the way arrays (like lists) do lookups. The logical error is that if you have a datatype that only saves duplicates, what happens if you mutate one of the elements to be non-unique?

a, b = [0], [0, 1]
s = SpecialSet(a, b)
a.append(1)  # NOW WHAT?!

If you want to add a dictionary to a set, you can add the dict.items view of it (which is really just a list of tuples), but you have to cast to tuple first.

a = {1:2, 3:4}
s = set()
s.add(tuple(a.items()))

Then you'd have to re-cast to dict that once it leaves the set to get a dictionary back

for tup in s:
    new_a = dict(tup)

A built-in frozendict type was proposed in PEP416 but ultimately rejected.

Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • @NotAnAmbiTurner: But then your SpecialSet has duplicates in it. – user2357112 Dec 05 '15 at 01:05
  • @user2357112 Oh, I get it. Okay, thanks. Let me try to be more clear. – NotAnAmbiTurner Dec 05 '15 at 01:10
  • Okay, maybe I wasn't clear. I don't want to store a mutable inside a SpecialSet, I guess what I'm looking for is the ability for the `SpecialSet` to create it's own instance when a mutable is passed in. Do folks use mutability like that, to store something inside another object, and mutate it while inside it's container? That just seems... wrong... to me, but I'm a beginner. Wouldn't you want to get the value, mess with it, and re-add it to the container? – NotAnAmbiTurner Dec 05 '15 at 01:13
  • So I guess my idea would be that `SpecialSet.__init__` would copy and store each value, making sure they're unique first. – NotAnAmbiTurner Dec 05 '15 at 01:17
  • @NotAnAmbiTurner I'm not at all sure what you're talking about here. How is that different from what a `set` does? – Adam Smith Dec 05 '15 at 03:03
  • @AdamSmith one thing that makes it different is that sets only take hashable objects – NotAnAmbiTurner Dec 06 '15 at 04:56
  • Suppose I have a dicts of some structure, and only the one of the keys needs to be unique, how can I compose such a list. Let me put that other way. I have dictionaries of some key:value pairs, only one of the keys is immutable, the other keys are mutable. I need to create a unique list of such dictionaries, what can I do? – Edik Mkoyan Aug 21 '17 at 13:18
  • @EdikMkoyan I'm not certain what you're trying to do. Can you ask a question about it with some concrete examples? – Adam Smith Aug 21 '17 at 17:56
  • @AdamSmith I need to generete a uniqe list of dictionaries, that contain URLs and some http header info like this. { "_id" : "http://www.example.com/bd/en/?preferredCountry=yes", "status" : 200, "headers" : { "Server" : "Apache-Coyote/1.1", "Access-Control-Allow-Origin" : "http://example.com", "Date" : "Mon, 21 Aug 2017 08:52:00 GMT", "Connection" : "keep-alive", "Vary" : "Accept-Encoding" }, "origin" : "http://example.com" } I will iterate over that list and at the and will add to mongodb. – Edik Mkoyan Aug 22 '17 at 10:26
6

Using set.union(), ask for the elements of the argument to the method to be added to the set, not the object itself. Iterating over a dictionary gives you the keys. You'd get similar results if you used set.union() on a list, a tuple or a string, the contents of those are added to the set:

>>> s = {42}
>>> s.union('foo')
set([42, 'o', 'f'])

The single-character strings 'o' and 'f' were added, not the string 'foo'.

You cannot add dictionaries to a set, because they are mutable; sets only support storing hashable objects, and one of the requirements for an object to be hashable, is that they are immutable.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 2
    nit: you can implement `__hash__` on a mutable object – Adam Smith Dec 04 '15 at 21:52
  • 2
    @AdamSmith: yes, breaking the requirement. Python lets you shoot yourself in that foot. – Martijn Pieters Dec 04 '15 at 21:53
  • My point more had to do with the behavior. It's weird to me that you can add a dict, as a dict, to a list, but not to a set. I had initially thought sets were designed to essentially be a `list` of unqiue items, but it seems there is more to it than that. – NotAnAmbiTurner Dec 05 '15 at 01:00
  • @NotAnAmbiTurner a `set` has more in common with a `dict` than a `list`. Since a `list` references by index, it doesn't care what's in it. A `set` (and the keys of a `dict`) reference by hash, which should be inseparably tied to value. – Adam Smith Dec 05 '15 at 03:00