21

I have a huge dictionary something like this:

d[id1][id2] = value

example:

books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20

and so on..

Each of the "auth" keys can have any set of "genres" associated wtih them. The value for a keyed item is the number of books they wrote.

Now what I want is to convert it in a form of matrix...something like:

                    "humor"       "action"        "comedy"
      "auth1"         20            30               0
      "auth2"          0            0                20

How do i do this? Thanks

Paul Seeb
  • 6,006
  • 3
  • 26
  • 38
frazman
  • 32,081
  • 75
  • 184
  • 269
  • first iterating thru the dictionary and then finding the number of rows and columns.. after that as i am iterating converting each entry as a defined vector.. and then in another iteration thru id1.. associating it with their vectors – frazman May 16 '12 at 17:30
  • Do you just want it printed out like that? Why does it need to go into a numpy matrix – Paul Seeb May 16 '12 at 17:42
  • @PaulSeeb: no no.. actually I want to later to perform svd of this matrix.. – frazman May 16 '12 at 17:44

3 Answers3

27

pandas do this very well:

books = {}
books["auth1"] = {}
books["auth2"] = {}
books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20

from pandas import *

df = DataFrame(books).T.fillna(0)

The output is:

       action  comedy  humor
auth1      30       0     20
auth2       0      20      0
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • @HYRY Can a pandas DataFrame be used as input for matplotlib.pcolor to create a heat map? Or does one have to convert to a numpy array first? – tommy.carstensen May 12 '15 at 15:47
  • In case of variable-length dictionary values, use `DataFrame.from_dict(books, orient='index').fillna(0)` instead to prevent `ValueError`. – interpolack Jul 29 '15 at 19:24
10

Use a list comprehension to turn a dict into a list of lists and/or a numpy array:

np.array([[books[author][genre] for genre in sorted(books[author])] for author in sorted(books)])

EDIT

Apparently you have an irregular number of keys in each sub-dictionary. Make a list of all the genres:

genres = ['humor', 'action', 'comedy']

And then iterate over the dictionaries in the normal manner:

list_of_lists = []
for author_name, author in sorted(books.items()):
    titles = []
    for genre in genres:
        try:
            titles.append(author[genre])
        except KeyError:
            titles.append(0)
    list_of_lists.append(titles)

books_array = numpy.array(list_of_lists)

Basically I'm attempting to append a value from each key in genres to a list. If the key is not there, it throws an error. I catch the error, and append a 0 to the list instead.

Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
0

In 2018, I think that Pandas 0.22 supports this out of the box. Specifically, please check the from_dict class method of DataFrame.

books = {}
books["auth1"] = {}
books["auth2"] = {}
books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20

pd.DataFrame.from_dict(books, orient='columns', dtype=None)
jtromans
  • 4,183
  • 6
  • 35
  • 33