23

I have a NxM dataframe and a NxL numpy matrix. I'd like to add the matrix to the dataframe to create L new columns by simply appending the columns and rows the same order they appear. I tried merge() and join(), but I end up with errors:

assign() keywords must be strings

and

columns overlap but no suffix specified

respectively.

Is there a way I can add a numpy matrix as dataframe columns?

sacuL
  • 49,704
  • 8
  • 81
  • 106
Booley
  • 819
  • 1
  • 9
  • 25

2 Answers2

40

You can turn the matrix into a datframe and use concat with axis=1:

For example, given a dataframe df and a numpy array mat:

>>> df
   a  b
0  5  5
1  0  7
2  1  0
3  0  4
4  6  4

>>> mat
array([[0.44926098, 0.29567859, 0.60728561],
       [0.32180566, 0.32499134, 0.94950085],
       [0.64958125, 0.00566706, 0.56473627],
       [0.17357589, 0.71053224, 0.17854188],
       [0.38348102, 0.12440952, 0.90359566]])

You can do:

>>> pd.concat([df, pd.DataFrame(mat)], axis=1)
   a  b         0         1         2
0  5  5  0.449261  0.295679  0.607286
1  0  7  0.321806  0.324991  0.949501
2  1  0  0.649581  0.005667  0.564736
3  0  4  0.173576  0.710532  0.178542
4  6  4  0.383481  0.124410  0.903596
sacuL
  • 49,704
  • 8
  • 81
  • 106
  • 1
    for some reason concat is not happening perfectly, no. of records are increasing though I have same no. of records in df and mat – rishi jain Jul 22 '20 at 06:10
  • @rishijain, you may need to `reset_index()` on the dataframes prior to concatenating them if the indicies between them don't match. – Andreas Feb 19 '21 at 06:15
5

Setup

df = pd.DataFrame({'a': [5,0,1,0,6], 'b': [5,7,0,4,4]})
mat = np.random.rand(5,3)

Using join:

df.join(pd.DataFrame(mat))

   a  b         0         1         2
0  5  5  0.884061  0.803747  0.727161
1  0  7  0.464009  0.447346  0.171881
2  1  0  0.353604  0.912781  0.199477
3  0  4  0.466095  0.136218  0.405766
4  6  4  0.764678  0.874614  0.310778

If there is the chance of overlapping column names, simply supply a suffix:

df = pd.DataFrame({0: [5,0,1,0,6], 1: [5,7,0,4,4]})
mat = np.random.rand(5,3)

df.join(pd.DataFrame(mat), rsuffix='_')

   0  1        0_        1_         2
0  5  5  0.783722  0.976951  0.563798
1  0  7  0.946070  0.391593  0.273339
2  1  0  0.710195  0.827352  0.839212
3  0  4  0.528824  0.625430  0.465386
4  6  4  0.848423  0.467256  0.962953
user3483203
  • 50,081
  • 9
  • 65
  • 94