1

I am using Python 2.7 with Pandas on a Windows 10 machine.

I have an n by n Dataframe where:

1) The index represents peoples names

2) The column headers are the same peoples names in the same order

3) Each cell of the Dataframeis the average number of times they email each other each day.

How would I transform that Dataframeinto a Dataframewith 3 columns, where:

1) Column 1 would be the index of the n by n Dataframe

2) Column 2 would be the row headers of the n by n Dataframe

3) Column 3 would be the cell value corresponding to those two names from the index, column header combination from the n by n Dataframe

Edit

Appologies for not providing an example of what I am looking for. I would like to take df1 and turn it into rel_df, from the code below.

import pandas as pd
from itertools import permutations
df1 = pd.DataFrame()
df1['index'] = ['a', 'b','c','d','e']
df1.set_index('index', inplace = True)
df1['a'] = [0,1,2,3,4]
df1['b'] = [1,0,2,3,4]
df1['c'] = [4,1,0,3,4]
df1['d'] = [5,1,2,0,4]
df1['e'] = [7,1,2,3,0]

##df of all relationships to build
flds = pd.Series(SO_df.fld1.unique())
flds = pd.Series(flds.append(pd.Series(SO_df.fld2.unique())).unique())

combos = []
for L in range(0, len(flds)+1):
  for subset in permutations(flds, L):
      if len(subset) == 2:
          combos.append(subset)
      if len(subset) > 2:
          break

rel_df = pd.DataFrame.from_records(data = combos, columns = ['fld1','fld2'])
rel_df['value'] = [1,4,5,7,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4]

print df1
>>> print df1
       a  b  c  d  e
index               
a      0  1  4  5  7
b      1  0  1  1  1
c      2  2  0  2  2
d      3  3  3  0  3
e      4  4  4  4  0

>>> print rel_df
   fld1 fld2  value
0     a    b      1
1     a    c      4
2     a    d      5
3     a    e      7
4     b    a      1
5     b    c      1
6     b    d      1
7     b    e      1
8     c    a      2
9     c    b      2
10    c    d      2
11    c    e      2
12    d    a      3
13    d    b      3
14    d    c      3
15    d    e      3
16    e    a      4
17    e    b      4
18    e    c      4
19    e    d      4
BeeGee
  • 815
  • 2
  • 17
  • 33

2 Answers2

2

Use melt:

df1 = df1.reset_index()
pd.melt(df1, id_vars='index', value_vars=df1.columns.tolist()[1:])

(If in your actual code you're explicitly setting the index as you do here, just skip that step rather than doing the reset_index; melt doesn't work on an index.)

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • I went with this solution because using melt is just one line of code. @Alexander Your solution works as well. I gave you a +1 for that. Thanks all! – BeeGee Mar 03 '16 at 18:25
2
# Flatten your dataframe.
df = df1.stack().reset_index()

# Remove duplicates (e.g. fld1 = 'a' and fld2 = 'a').
df = df.loc[df.iloc[:, 0] != df.iloc[:, 1]]

# Rename columns.
df.columns = ['fld1', 'fld2', 'value']

>>> df
   fld1 fld2  value
1     a    b      1
2     a    c      4
3     a    d      5
4     a    e      7
5     b    a      1
7     b    c      1
8     b    d      1
9     b    e      1
10    c    a      2
11    c    b      2
13    c    d      2
14    c    e      2
15    d    a      3
16    d    b      3
17    d    c      3
19    d    e      3
20    e    a      4
21    e    b      4
22    e    c      4
23    e    d      4
Alexander
  • 105,104
  • 32
  • 201
  • 196