python - how to append numpy array to a pandas dataframe

Question

I have trained a Logistic Regression classifier to predict whether a review is positive or negative. Now, I want to append the predicted probabilities returned by the predict_proba-function to my Pandas data frame containing the reviews. I tried doing something like:

test_data['prediction'] = sentiment_model.predict_proba(test_matrix)

Obviously, that doesn't work, since predict_proba returns a 2D-numpy array. So, what is the most efficient way of doing this? I created test_matrix with SciKit-Learn's CountVectorizer:

vectorizer = CountVectorizer(token_pattern=r'\b\w+\b')
train_matrix = vectorizer.fit_transform(train_data['review_clean'].values.astype('U'))
test_matrix = vectorizer.transform(test_data['review_clean'].values.astype('U'))

Sample data looks like:

| Review                                     | Prediction         |                      
| ------------------------------------------ | ------------------ |
| "Toy was great! Our six-year old loved it!"|   0.986            |

Related question: http://stackoverflow.com/questions/41904197/data-frame-of-tfidf-with-python — MaxU - stand with Ukraine, Feb 18 '17 at 11:36
Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If `x` is the 2D numpy array with predictions, `x = sentiment_model.predict_proba(test_matrix)` then you can do, `test_data['prediction0'] = x[:,0]` and `test_data['prediction1'] = x[:,1]` — Karthik Arumugham, Feb 18 '17 at 11:46
@KarthikArumugham thanks so much. It worked like a charm! I need to sharpen up on slicing and dicing data ;) — DBE7, Feb 18 '17 at 12:08

score 24 · Accepted Answer · answered Feb 18 '17 at 12:50

24

Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x is the 2D numpy array with predictions,

x = sentiment_model.predict_proba(test_matrix)

then you can do,

test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]

answered Feb 18 '17 at 12:50

Karthik Arumugham

1,300
1
11
18

was very helpful – suku Jun 17 '17 at 15:27

score 3 · Answer 2 · answered Apr 08 '21 at 14:57

import numpy as np
import pandas as pd

df = pd.DataFrame(
    np.arange(10).reshape(5, 2), columns=['a', 'b'])
print('df:', df, sep='\n')

arr = np.arange(100, 104).reshape(2, 2)
print('array to append:', arr, sep='\n')

df = df.append(pd.DataFrame(arr, columns=df.columns), ignore_index=True)
print('df:', df, sep='\n')

output

df:
   a  b
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
array to append:
[[100 101]
 [102 103]]
df:
     a    b
0    0    1
1    2    3
2    4    5
3    6    7
4    8    9
5  100  101
6  102  103

python - how to append numpy array to a pandas dataframe

2 Answers2