Saving oversampled dataset as csv file in pandas

Question

I am new to Python and apologize in advance, if it is too simple. Cannot find anything and this question did not help.

My code is

# Split data
y = starbucks_smote.iloc[:, -1]
X = starbucks_smote.drop('label', axis = 1)

# Count labels by type
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 2895})

# Transform the dataset
oversample = SMOTE()
X, y = oversample.fit_resample(X, y)

# Print the oversampled dataset
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 9634})

How to save the oversampled dataset for future work?

I tried

data_res = np.concatenate((X, y), axis = 1)
data_res.to_csv('sample_smote.csv')

Got an error

ValueError: all the input arrays must have same number of dimensions, 
but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Appreciate any tips!

What is result or error after trying to save? – ipj Aug 24 '20 at 08:05 — ipj, Aug 24 '20 at 08:05

score 3 · Accepted Answer · edited Oct 10 '20 at 21:06

3

You may create dataframe:

data_res = pd.DataFrame(X)
data_res['y'] = y

and then save data_res to CSV.

Solution based on concatenation od numpy.arrays is also possible, but np.vstack is needed to make dimensions compliant:

data_res = np.concatenate((X, np.vstack(y)), axis = 1)
data_res = pd.DataFrame(data_res)

edited Oct 10 '20 at 21:06

marc_s

732,580
175
1,330
1,459

answered Aug 24 '20 at 08:14

ipj

3,488
1
14
18

Saving oversampled dataset as csv file in pandas

1 Answers1

Linked