1

I have created a data frame with columnnames and rows. For example

   a   b   c   d
1 Nan Nan Nan Nan
2 Nan Nan Nan Nan
3 Nan Nan Nan Nan

Now, I want to fill in the dataframe with values from a tsv file. My file is as follows:

1  b   80
2  c   90
3  a   60

To create this

   a   b   c   d
1 Nan 80  Nan Nan
2 Nan Nan 90  Nan
3 60  Nan Nan Nan

Can I use pandas to fill in the dataframe with the values from my file? So, I want python to look for the right row and column and put my value there. Thanks in advance!

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Gravel
  • 365
  • 1
  • 5
  • 19

1 Answers1

0

I think you need pivot with reindex:

df1 = pd.DataFrame(index=[1,2,3], columns=list('abcd'))
print (df1)
     a    b    c    d
1  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN
3  NaN  NaN  NaN  NaN

print (df2)
   A  B   C
0  1  b  80
1  2  c  90
2  3  a  60

df = df2.pivot(index='A', columns='B', values='C') \
        .reindex(index=df1.index, columns= df1.columns)
print (df)
      a     b     c   d
1   NaN  80.0   NaN NaN
2   NaN   NaN  90.0 NaN
3  60.0   NaN   NaN NaN

Alternative solution with unstack:

df = df2.set_index(['A','B'])['C'].unstack() \
        .reindex(index=df1.index, columns= df1.columns)
print (df)
      a     b     c   d
1   NaN  80.0   NaN NaN
2   NaN   NaN  90.0 NaN
3  60.0   NaN   NaN NaN

If duplicates in A and B use pivot_table with some aggregate function like mean:

print (df2)
   A  B    C
0  1  b   80 <-duplicates for same A, B
1  1  b  100 <-duplicates for same A, B
2  2  c   90
3  3  a   60

df = df2.pivot_table(index='A', columns='B', values='C', aggfunc='mean') \
        .reindex(index=df1.index, columns= df1.columns)
print (df)
      a     b     c   d
1   NaN  90.0   NaN NaN
2   NaN   NaN  90.0 NaN
3  60.0   NaN   NaN NaN

df = df2.groupby(['A','B'])['C'].mean().unstack() \
        .reindex(index=df1.index, columns= df1.columns)
print (df)
      a     b     c   d
1   NaN  90.0   NaN NaN
2   NaN   NaN  90.0 NaN
3  60.0   NaN   NaN NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Is it also possible to use positions in df2 instead of columnnames? For example: df = df2.pivot_table(index=[0], columns=[1], values=[2], aggfunc='mean') \ .reindex(index=df1.index, columns= df1.columns)... If i use this, I get the message "Indexerror: indices are out-of-bounds" – Gravel Apr 06 '17 at 07:12
  • You can use this trick - get columns names by positions - `df = df2.pivot_table(index=df2.columns[0], columns=df2.columns[1], values=df2.columns[2]) .reindex(index=df1.index, columns= df1.columns)` – jezrael Apr 06 '17 at 07:19
  • Ok thanks, this works. But now, if df1 contains already some columns with values, and I use this command, all values in those already filled columns are set to NaN. How can I keep these values in these columns? – Gravel Apr 06 '17 at 08:19
  • It seems you need `df3 = df2.groupby(['A','B'])['C'].mean().unstack()` and then `df = pd.concat([df, df3], axis=1)`, but without data it is problematic answer. – jezrael Apr 06 '17 at 08:21
  • Btw, super idea ask new question ;) – jezrael Apr 06 '17 at 08:42
  • Yes, because I was able with the new question to show what I mean. Thanks for your help, you are great! – Gravel Apr 06 '17 at 08:47