Questions tagged [subsampling]
116 questions
1
vote
0 answers
How to get an even subsample from a dataframe in R with multiple variable
I have a dataframe with 67 items that looks like this:
df <- data.frame("item"= c("item1", "item2", "item3", "item4", "item5"), "variable1"=c(10.51, 16.54, 12.35, 9.44, 20.11), "variable2"=c(15.65, 25.68, 14.48, 19.87, 30.21), "variable3"=c(19.35,…

Thea
- 11
- 1
1
vote
1 answer
Retrieve 100 samples closest to the centroids of each cluster after K means clustering using R
I'm trying to reduce the input data size by first performing a K-means clustering in R then sample 50-100 samples per representative cluster for downstream classification and feature selection.
The original dataset was split 80/20, and then 80% went…

ML33M
- 341
- 2
- 19
1
vote
1 answer
Loop over left joins
I've been trying to loop over left joins (using R). I need to create a table with columns representing samples from a larger table. Each column of the new table should represent each of these samples.
library(tidyr)
largetable <-…

D C
- 13
- 3
1
vote
0 answers
How to use tf.data.Dataset.interleave to subsample from multi dataset objects in tf2?
I tried to replicate the solution posted here with tf.data.Dataset.interleave, but not quite sure how to apply the interleave method to already created dataset objects.
here is the code:
import tensorflow as tf
import numpy as np
# preparing…

Hoda
- 21
- 2
1
vote
1 answer
Sample random rows evenly spaced apart in R
I have a df of measurements over 50 years. I am trying to subsample the data to see what patterns I would have found had I only sampled in 2 years, or in 3, 4, 5, etc, instead of in all 50. I made a code that will pull random years from the dataset,…

Jake L
- 987
- 9
- 21
1
vote
1 answer
Subsampling a 1D array of integer so that the sum hits a target value in python
I have two 1D arrays of integers whose some differ, for example:
a = [1,2,2,0,3,5]
b = [0,0,3,2,0,0]
I would like the sum of each array to be equal to that of the smallest of the two. However I want to keep values as integers, not floats, so…

APiazza
- 13
- 3
1
vote
0 answers
Get all possible combinations of numpy array elements
I need to get all possible combinations nCr of all possible sizes of a numpy array.
[1,2,3,4,5]
should give us a set of…

DDR
- 459
- 5
- 15
1
vote
1 answer
How to convert Y Cb Cr to RGB in MATLAB manually?
I've been tasked with performing a 4:2:0 chroma subsampling (color compression) on a series of JPEGs.
The first step is to ensure that I can generate my Y, Cb, and Cr values and then convert back to RGB and display the image. Then I can go back…

Jared Boyd
- 11
- 2
1
vote
1 answer
Chroma Subsampling with ffmpeg
I want to create an .mp4 output. But it doesn't work...
I'm using ffmpeg. My input video is a raw video and I want to have an raw video .mp4 at the end.
My code that i use:
ffmpeg.exe -i input.y4m -c:v rawvideo -vf format=yuv420p output.y4m
Can…

Coder95
- 131
- 1
- 9
1
vote
0 answers
spark efficient distribution pairing to compare cohorts
How can I efficiently compare matched cohorts in spark?
In python for each observation of the minority class in a highly imbalanced dataset sampling k observations from the majority class can be implemented in a fairly straightforward way (i.e.…

Georg Heiler
- 16,916
- 36
- 162
- 292
1
vote
1 answer
Subsampling 3D array using the neighbourhood sum
The title is probably confusing. I have a reasonably large 3D numpy array. I'd like to cut it's size by 2^3 by binning blocks of size (2,2,2). Each element in the new 3D array should then contain the sum of the elements in it's respective block in…

Matheus Leão
- 417
- 4
- 12
1
vote
4 answers
SED: How to remove every 10 lines in a file (thin or subsample the file)
I have this so far:
sed -n '0,10p' yourfile > newfile
But it is not working, just outputs a blank file :(

John
- 5,139
- 19
- 57
- 62
1
vote
1 answer
python 1:1 stratified sampling per each group
How can a 1:1 stratified sampling be performed in python?
Assume the Pandas Dataframe df to be heavily imbalanced. It contains a binary group and multiple columns of categorical sub groups.
df = pd.DataFrame({'id':[1,2,3,4,5], 'group':[0,1,0,1,0],…

Georg Heiler
- 16,916
- 36
- 162
- 292
1
vote
1 answer
How to subsample different numbers by ID and bootstrap in R
First, I'm trying to subsample a large dataset with many individuals, but each individual requires a different subsample size. I'm comparing across two time periods, so I want to subsample each individual by the minimum data points each has across…

user9351962
- 77
- 8
1
vote
1 answer
How can I control subsampling such that xgb.cv and cross_validate produce the same results?
xgb.cv and sklearn.model_selection.cross_validate do not produce the same mean train/test error even though I set the same seed/random_state and I make sure both methods use the same folds. The code at the bottom allows to reproduce my issue. (Early…

Maauss
- 11
- 3