1

I am new to Python and apologize in advance, if it is too simple. Cannot find anything and this question did not help.

My code is

# Split data
y = starbucks_smote.iloc[:, -1]
X = starbucks_smote.drop('label', axis = 1)

# Count labels by type
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 2895})

# Transform the dataset
oversample = SMOTE()
X, y = oversample.fit_resample(X, y)

# Print the oversampled dataset
counter = Counter(y)
print(counter)
Counter({0: 9634, 1: 9634})

How to save the oversampled dataset for future work?

I tried

data_res = np.concatenate((X, y), axis = 1)
data_res.to_csv('sample_smote.csv')

Got an error

ValueError: all the input arrays must have same number of dimensions, 
but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

Appreciate any tips!

Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63

1 Answers1

3

You may create dataframe:

data_res = pd.DataFrame(X)
data_res['y'] = y

and then save data_res to CSV.

Solution based on concatenation od numpy.arrays is also possible, but np.vstack is needed to make dimensions compliant:

data_res = np.concatenate((X, np.vstack(y)), axis = 1)
data_res = pd.DataFrame(data_res)
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
ipj
  • 3,488
  • 1
  • 14
  • 18