converting a 2d dictionary to a numpy matrix

Question

I have a huge dictionary something like this:

d[id1][id2] = value

example:

books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20

and so on..

Each of the "auth" keys can have any set of "genres" associated wtih them. The value for a keyed item is the number of books they wrote.

Now what I want is to convert it in a form of matrix...something like:

                    "humor"       "action"        "comedy"
      "auth1"         20            30               0
      "auth2"          0            0                20

How do i do this? Thanks

first iterating thru the dictionary and then finding the number of rows and columns.. after that as i am iterating converting each entry as a defined vector.. and then in another iteration thru id1.. associating it with their vectors — frazman, May 16 '12 at 17:30
Do you just want it printed out like that? Why does it need to go into a numpy matrix — Paul Seeb, May 16 '12 at 17:42
@PaulSeeb: no no.. actually I want to later to perform svd of this matrix.. — frazman, May 16 '12 at 17:44

score 27 · Answer 1 · answered May 17 '12 at 01:12

27

pandas do this very well:

books = {}
books["auth1"] = {}
books["auth2"] = {}
books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20

from pandas import *

df = DataFrame(books).T.fillna(0)

The output is:

       action  comedy  humor
auth1      30       0     20
auth2       0      20      0

answered May 17 '12 at 01:12

HYRY

94,853
25
187
187

@HYRY Can a pandas DataFrame be used as input for matplotlib.pcolor to create a heat map? Or does one have to convert to a numpy array first? – tommy.carstensen May 12 '15 at 15:47
In case of variable-length dictionary values, use `DataFrame.from_dict(books, orient='index').fillna(0)` instead to prevent `ValueError`. – interpolack Jul 29 '15 at 19:24

Joel Cornett · Accepted Answer · 2012-05-16T17:57:40.450

Use a list comprehension to turn a dict into a list of lists and/or a numpy array:

np.array([[books[author][genre] for genre in sorted(books[author])] for author in sorted(books)])

EDIT

Apparently you have an irregular number of keys in each sub-dictionary. Make a list of all the genres:

genres = ['humor', 'action', 'comedy']

And then iterate over the dictionaries in the normal manner:

list_of_lists = []
for author_name, author in sorted(books.items()):
    titles = []
    for genre in genres:
        try:
            titles.append(author[genre])
        except KeyError:
            titles.append(0)
    list_of_lists.append(titles)

books_array = numpy.array(list_of_lists)

Basically I'm attempting to append a value from each key in genres to a list. If the key is not there, it throws an error. I catch the error, and append a 0 to the list instead.

Hi, this gives me : array([[20, 30], [50]], dtype=object) but what i was expecting was [[20, 30, 0],[0,0,50]] — frazman, May 16 '12 at 17:47
@Fraz: ah, so you have an irregular number of keys for each author dict. Let me edit. — Joel Cornett, May 16 '12 at 17:49

score 0 · Answer 3 · answered Mar 02 '18 at 23:25

In 2018, I think that Pandas 0.22 supports this out of the box. Specifically, please check the from_dict class method of DataFrame.

books = {}
books["auth1"] = {}
books["auth2"] = {}
books["auth1"]["humor"] = 20
books["auth1"]["action"] = 30
books["auth2"]["comedy"] = 20

pd.DataFrame.from_dict(books, orient='columns', dtype=None)

converting a 2d dictionary to a numpy matrix

3 Answers3

Linked