Questions tagged [oversampling]

Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set (i.e. the ratio between the different classes/categories represented).

156 questions
3
votes
0 answers

Python - How to differentiate SMOTE resampling from original data

I over sampled my data using SMOTE like so: >>> from imblearn.over_sampling import SMOTE >>> X_resampled, y_resampled = SMOTE().fit_resample(X, y) So now X_resampled, y_resampled are larger than the original data set. How can I tell apart the…
Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186
3
votes
1 answer

Keras: multi class imbalanced data classification is overfitting

I have a small dataset of ~1000 rows with two categorical columns [Message], [Intent]. I want to create a classification model and make predictions for new, unseen messages. The 29 unique intents are imbalanced, ranging from 116 to 4 value…
joasa
  • 946
  • 4
  • 15
  • 35
3
votes
1 answer

Output of shape for training after oversampling with imbalanced-learn

I am using imbalanced-learn to oversample my data. I want to know how many entries in each class there are after using the oversampling method. This code works nicely: import imblearn.over_sampling import SMOTE from collections import Counter def…
Christoph H.
  • 173
  • 1
  • 14
3
votes
1 answer

Python oversampling combine several samplers in a pipeline

My issue concerns the Value Error raised by SMOTE class. Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6 # imbalanced learn is a package containing impelementation of SMOTE from imblearn.over_sampling import SMOTE, ADASYN,…
3
votes
1 answer

Is there a package or function that can do SMOTE with continuous and categorical features?

I have an unbalanced data set with a categorical dependent variable and feature variables that are continuous and categorical. I know that the SMOTE function from the DMwR package can handle only continuous features. Is there package that can handle…
3
votes
0 answers

Correct split of dependent variable values in machine learning?

I am making a machine learning model in Python and there are only categorical variables in the data set. I want a precision of minimum 90% (for the value of 1 in the dependent variable). In the original data (the raw YTD data that I pulled from the…
user5751943
3
votes
1 answer

SMOTE function 'subscript out of bond'

I'm trying to implement a logistic regression as follows: However I can't get good predictions because my class output 1 is under-represented in my data. Therefore I'm trying to apply SMOTE algorithm to my trainset in order to get better…
T. Ciffréo
  • 126
  • 10
3
votes
1 answer

Get pixel coordinates from ra, dec after oversampling FITS image

I'm looking for a way to locate the pixel coordinates on my FITS image that correspond to ra and dec positions of an object in degrees, after oversampling. This would be simple if I wasn't oversampling, but I need to. Given an unaltered FITS image,…
curious_cosmo
  • 1,184
  • 1
  • 18
  • 36
3
votes
1 answer

Multi-Class Classification: SMOTE oversampling for multiple columns in a row

I have an imbalanced dataset contained in a dataframe called city_country that is made up of 5 columns: Content of a tweet = preprocessed An event type (e.g. tweet relates to earthquake = 'earthquake', typhoon = 'typhoon', etc.) =…
3
votes
1 answer

How to oversample a dataframe in Pyspark?

How to oversample a dataframe in pyspark? df.sample(fractions, seed) Which only sample a fraction of the df, it can't oversample.
Stevven
  • 31
  • 1
  • 3
2
votes
0 answers

python : how to improve classification results after having use combination of oversampling (SMOTE) and undersampling (RandomUnderSampler)

I have a problem of imbalanced classes and small dataset : 0 : 142 1 : 29 I try to find the right method to deal with this issue and the best algorithm. For now the best results I have came from using a combination of oversampling with SMOTE and…
DuneC
  • 21
  • 1
2
votes
1 answer

2D Gaussian oversampling over large dataframe

I currently have a dataframe in the following format: step tag_id x_pos y_pos 1 1 5 3 1 2 3 4 2 1 2 2 2 3 1 6 ......................... ......................... N 1 …
ebrithilotho
  • 142
  • 1
  • 8
2
votes
1 answer

How to oversample a 3d array?

I'm trying to predict the category of a news article based on 2 features: author name and article headline. I have transformed both columns separately using CountVectorizer and TfidfTransformer. Thus, what I have now is a 3D array (ie. array of list…
Brian
  • 33
  • 1
  • 6
2
votes
1 answer

AttributeError: 'DataFrame' object has no attribute 'name' when using SMOTE

I am using imblearn over_sampling SMOTE technique in order to balance my imbalanced dataset. Here is my sample code import pandas as pd dataset=pd.read_csv('E://IOT_Netlume//hourly_data.csv') features= dataset.iloc[:,[1,2,3,4]] target=…
DS_Geek
  • 53
  • 1
  • 10
2
votes
3 answers

How can i apply SMOTE for multiclass text data

I have a Multiclass dataset for which i want to use SMOTE, but i am facing an ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict. I want to balance my data using SMOTE or any other…
1
2
3
10 11