3

I saw an answer to a question on converting a nested "2D" dictionary to a Pandas DataFrame. That would be a solution to my problem, but then I was wondering, whether I can the skip the intermediate step of generating a nested dictionary. Let's say my input input.txt looks like this:

A B 1
A C 2
B C 3

Can I convert that to the following symmetric matrix with either Pandas or Numpy without having to generate an intermediate nested dictionary?

  A B C
A 0 1 2
B 1 0 3
C 2 3 0

The nested dictionary that I want to avoid creating would be:

d = {'A':{'B':1,'C':2},'B':{'C':3}}

I tried this after reading the IO Tools documentation on "Reading an index with a MultiIndex":

import pandas as pd
df = pd.read_csv('input.txt', sep=' ', index_col=[0,1], header=None)

But I don't get a 2D heat map, when I do:

import matplotlib.pyplot as plt
plt.pcolor(df)
plt.imshow()
Community
  • 1
  • 1
tommy.carstensen
  • 8,962
  • 15
  • 65
  • 108

1 Answers1

7

Not sure whether this is all that much more efficient, but you could pivot and then add the frame to its transpose, something like:

df = pd.read_csv("input.txt", header=None, delim_whitespace=True)
df = df.pivot(0,1,2)
df.add(df.T, fill_value=0).fillna(0)

   A  B  C
A  0  1  2
B  1  0  3
C  2  3  0

Here is the documentation on add and pivot. Here is what is going on. The first line df = pd.read_csv("input.txt", header=None, delim_whitespace=True) returns:

   0  1  2
0  A  B  1
1  A  C  2
2  B  C  3

The second line df = df.pivot(0,1,2) then returns:

1   B   C
0           
A   1   2
B NaN   3

The magic numbers 0, 1 and 2 are index, columns and values. index=0 is the column name to use to make the index of the new frame. index is just pandas lingo for a row name. columns=1 is the column name to use to make the columns of the new frame. And values=2 is just the column name to use for making the values of the new frame.

The third line df.add(df.T, fill_value=0).fillna(0) just adds the transpose to convert the triangular matrix to a symmetric matrix. It returns:

   A  B  C
A  0  1  2
B  1  0  3
C  2  3  0
Tim
  • 41,901
  • 18
  • 127
  • 145
DSM
  • 342,061
  • 65
  • 592
  • 494
  • thank you for being the first person to show me, why I should perhaps bother learning pandas. Sorry for not voting up your answer immediately, but I had to read the documentation first to understand the `df.pivot(0,1,2)` part of your answer. – tommy.carstensen May 12 '15 at 18:30
  • It's not efficient, but it's only 3 lines of code, which is essential to me, since this is for a practical I'm doing for a course in Africa with participants that have little programming experience. Super grateful. Thank you. – tommy.carstensen May 12 '15 at 19:02