Questions tagged [downsampling]
231 questions
0
votes
2 answers
Train/Test samples are not random when downsampling R
My data set consists of information collected from inpatients on their satisfaction about the services they received at the hospital. Data looks as below (only a set of variables are mentioned here);
$ Advised :…

user13178113
- 15
- 5
0
votes
1 answer
I am trying to use ROSE to help sampling imbalance. My ovun.sample code is creating empty values, how can I fix this?
I am trying to use ROSE to help with an imbalanced dataset. I am about 90% there, but I am having trouble with my ovun.sample code. When I run the ovun.sample code, it does not create a "over", "under" or "both" dataset, the values are showing in…

Aether
- 1
0
votes
1 answer
Smote - Select Perc_under and Perc_Over
I am using smote for the 1st time in R
I am using smote on train data having majority class which is 0 - 7952346 and minority class 1- 27230,
I want to downsample such that I have 1's near to 30000 and 0's near to this range 180000-200000.
I am…

Dexter1611
- 492
- 1
- 4
- 15
0
votes
1 answer
Is there a way to target an overall sample size when using stratified sampling in R?
I've got a dataset which represents 50,000 simulations. Each simulation has multiple scenario id's and associated with each scenario id is a second identifier called target. The first four simulations might look like the…

Bemused
- 21
- 1
0
votes
1 answer
Python: downsampling tokens or downsampling word2vec model
I'd need some help with a downsampling issue. I have to make a larger corpus (6 654 940 sentences, 19 592 258 tokens) comparable to a smaller one (15 607 sentences, 927 711 tokens), to implement them on 2 comparable word2vec models.
Each corpus is…

chiaras15
- 3
- 2
0
votes
1 answer
Undersampling for Imbalanced Class in Python
I currently have an imbalanced dataset of over 800,000 datapoints. The imbalance is severe as there is only 3719 datapoints for one of the two classes. Upon undersampling the data using NearMiss algorithm in Python and applying a Random Forest…

ML Enthusiast
- 13
- 5
0
votes
0 answers
Algorithm for reducing 1D function to polyline with small number of points
Problem
My input is 1D function y = f(x) and I want to find a way to approximate this function with a polyline with small number of points on some given interval :
What have I tried:
I solved this by making polyline with many points (1000+)…

Jan Spurny
- 5,219
- 1
- 33
- 47
0
votes
0 answers
DownSample with CGImageSourceCreateThumbnailAtIndex make memory rise dramatically
I using CGImageSourceCreateThumbnailAtIndex to downsample a huge image, which size is 23622 × 11811. I set the kCGImageSourceThumbnailMaxPixelSize key to 4683. Then the image will be resized to 4683 * 2341, which will take about 43877376.0 bytes…

bupo.jung
- 81
- 9
0
votes
1 answer
Anti-Aliasing Algorithm for Pixelart
I have an Image, or Pixelart for lack of better word, of very small size. It is actually just an array of numbers of around this size: new int[150][10]. I draw lines and curves on this array, mostly one colour on a black background. It is meant to…

MusicIsLife
- 51
- 5
0
votes
3 answers
Downsampling for more than 2 classes
I am creating a simple code which allows to down-sample a dataframe when your target variable has more than 2 classes.
Let df be our arbitrary dataset and 'TARGET_VAR' a categorical variable with more than 2 classes.
import pandas as…

CAPSLOCK
- 6,243
- 3
- 33
- 56
0
votes
1 answer
Reducing matrix size in 2D using KNN
I have a large binary matrix. I want to reduce the size of this matrix by using knn-approximation. What my idea is to cluster the matrix in groups of 4 neighbors and replace the group with a 1, if the number of 1s in the group is greater than or…

Shew
- 1,557
- 1
- 21
- 36
0
votes
1 answer
Down sampling and Moving Average in R
I have a very large signal with 20Hz sampling frequency and I am using moving average (movavg) fuction with n=20 to make it smooth but in result I get a signal with the same sampling rate as the input. Is there a function which takes input and…

Yasir Ahmed Pirkani
- 19
- 7
0
votes
0 answers
CNN non-conventional downsampling
I am new to CNNs and am building a model using Keras to combine inputs from multiple sources. Two of my sources have different dimensions and cannot be scaled by an integer number (i.e., x2 or x3 smaller). Therefore, simply max-pooling will not…

Jeff Lapierre
- 60
- 5
0
votes
2 answers
decimate data in python
I put decimate in the title, but I am not sure that is exactly what I mean. Here is the full description of the issue. I have a dataframe that contains data from several subjects. What I want to do is to analyze data that is X number of days apart.…

dc_neuro
- 3
- 1
- 3
0
votes
1 answer
Dplyr downsample in pipeline
I have a tibble like so:
tibble(a = c(1,2,3,4,5), b = c(1,1,1,2,2))
I want to randomly downsample the data by the "b" column, like so:
tibble(a = c(1,3,4,5), b = c(1,1,2,2))
How can I do this entirely in a Dplyr pipeline without changing the data…

Christopher Costello
- 1,186
- 2
- 16
- 30