How to save synthetic dataset in CSV file using SMOTE

Question

I am using Credit card data for oversampling using SMOTE. I am using the code written in geeksforgeeks.org (Link)

After running the following code, it states something like that:

print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1))) 
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0))) 

# import SMOTE module from imblearn library 
# pip install imblearn (if you don't have imblearn in your system) 
from imblearn.over_sampling import SMOTE 
sm = SMOTE(random_state = 2) 
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel()) 

print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape)) 
print('After OverSampling, the shape of train_y: {} \n'.format(y_train_res.shape)) 

print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1))) 
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0)))

Output:

Before OverSampling, counts of label '1': 345
Before OverSampling, counts of label '0': 199019 

After OverSampling, the shape of train_X: (398038, 29)
After OverSampling, the shape of train_y: (398038,) 

After OverSampling, counts of label '1': 199019
After OverSampling, counts of label '0': 199019

As I am totally new in this area. I cant understand how to show these data in CSV format. I will be very glad if anyone help me regarding this issue.

Or if there is any reference from where I can make synthetic data from a dataset using SMOTE and save the updated dataset in a CSV file, please mention it.

Something like following image:

Thanks in advance.

Anant Mittal · Answer 1 · 2019-11-01T06:58:36.823

1

From what I can see from you code, your X_train_res and others are Python Numpy arrays. You can do something like this:

import numpy as np
import pandas as pd

y_train_res = y_train_res.reshape(-1, 1) # reshaping y_train to (398038,1)
data_res = np.concatenate((X_train_res, y_train_res), axis = 1)
data.savetxt('sample_smote.csv', data_res, delimiter=",")

Cannot run and check it, but let me know if you face any issues.

Note: You will have to do something more to add column labels to it. Let me know once you are through this and need help for that.

edited Nov 01 '19 at 06:58

answered Nov 01 '19 at 05:57

Anant Mittal

1,923
9
15

Thanks for your help. Suppose I have an imbalanced dataset in CSV format. And in that dataset I do oversampling in the imbalance class. now I have synthetic data and the minority class data is equal to majority class data. Now I want to show this balanced dataset in another CSV file. – Nafisa Anjum Samia Nov 01 '19 at 06:12
I have attached an image in my question. I want something like that. – Nafisa Anjum Samia Nov 01 '19 at 06:18
1

Edited the answer, notify me if you face issues. – Anant Mittal Nov 01 '19 at 06:59
Why I got 'DataFrame' object has no attribute 'savetxt'? – Cyrus Oct 11 '21 at 10:54

How to save synthetic dataset in CSV file using SMOTE

1 Answers1

Linked