Questions tagged [subsampling]

116 questions
1
vote
0 answers

How to get an even subsample from a dataframe in R with multiple variable

I have a dataframe with 67 items that looks like this: df <- data.frame("item"= c("item1", "item2", "item3", "item4", "item5"), "variable1"=c(10.51, 16.54, 12.35, 9.44, 20.11), "variable2"=c(15.65, 25.68, 14.48, 19.87, 30.21), "variable3"=c(19.35,…
Thea
  • 11
  • 1
1
vote
1 answer

Retrieve 100 samples closest to the centroids of each cluster after K means clustering using R

I'm trying to reduce the input data size by first performing a K-means clustering in R then sample 50-100 samples per representative cluster for downstream classification and feature selection. The original dataset was split 80/20, and then 80% went…
ML33M
  • 341
  • 2
  • 19
1
vote
1 answer

Loop over left joins

I've been trying to loop over left joins (using R). I need to create a table with columns representing samples from a larger table. Each column of the new table should represent each of these samples. library(tidyr) largetable <-…
D C
  • 13
  • 3
1
vote
0 answers

How to use tf.data.Dataset.interleave to subsample from multi dataset objects in tf2?

I tried to replicate the solution posted here with tf.data.Dataset.interleave, but not quite sure how to apply the interleave method to already created dataset objects. here is the code: import tensorflow as tf import numpy as np # preparing…
1
vote
1 answer

Sample random rows evenly spaced apart in R

I have a df of measurements over 50 years. I am trying to subsample the data to see what patterns I would have found had I only sampled in 2 years, or in 3, 4, 5, etc, instead of in all 50. I made a code that will pull random years from the dataset,…
Jake L
  • 987
  • 9
  • 21
1
vote
1 answer

Subsampling a 1D array of integer so that the sum hits a target value in python

I have two 1D arrays of integers whose some differ, for example: a = [1,2,2,0,3,5] b = [0,0,3,2,0,0] I would like the sum of each array to be equal to that of the smallest of the two. However I want to keep values as integers, not floats, so…
APiazza
  • 13
  • 3
1
vote
0 answers

Get all possible combinations of numpy array elements

I need to get all possible combinations nCr of all possible sizes of a numpy array. [1,2,3,4,5] should give us a set of…
DDR
  • 459
  • 5
  • 15
1
vote
1 answer

How to convert Y Cb Cr to RGB in MATLAB manually?

I've been tasked with performing a 4:2:0 chroma subsampling (color compression) on a series of JPEGs. The first step is to ensure that I can generate my Y, Cb, and Cr values and then convert back to RGB and display the image. Then I can go back…
Jared Boyd
  • 11
  • 2
1
vote
1 answer

Chroma Subsampling with ffmpeg

I want to create an .mp4 output. But it doesn't work... I'm using ffmpeg. My input video is a raw video and I want to have an raw video .mp4 at the end. My code that i use: ffmpeg.exe -i input.y4m -c:v rawvideo -vf format=yuv420p output.y4m Can…
Coder95
  • 131
  • 1
  • 9
1
vote
0 answers

spark efficient distribution pairing to compare cohorts

How can I efficiently compare matched cohorts in spark? In python for each observation of the minority class in a highly imbalanced dataset sampling k observations from the majority class can be implemented in a fairly straightforward way (i.e.…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
1
vote
1 answer

Subsampling 3D array using the neighbourhood sum

The title is probably confusing. I have a reasonably large 3D numpy array. I'd like to cut it's size by 2^3 by binning blocks of size (2,2,2). Each element in the new 3D array should then contain the sum of the elements in it's respective block in…
Matheus Leão
  • 417
  • 4
  • 12
1
vote
4 answers

SED: How to remove every 10 lines in a file (thin or subsample the file)

I have this so far: sed -n '0,10p' yourfile > newfile But it is not working, just outputs a blank file :(
John
  • 5,139
  • 19
  • 57
  • 62
1
vote
1 answer

python 1:1 stratified sampling per each group

How can a 1:1 stratified sampling be performed in python? Assume the Pandas Dataframe df to be heavily imbalanced. It contains a binary group and multiple columns of categorical sub groups. df = pd.DataFrame({'id':[1,2,3,4,5], 'group':[0,1,0,1,0],…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
1
vote
1 answer

How to subsample different numbers by ID and bootstrap in R

First, I'm trying to subsample a large dataset with many individuals, but each individual requires a different subsample size. I'm comparing across two time periods, so I want to subsample each individual by the minimum data points each has across…
1
vote
1 answer

How can I control subsampling such that xgb.cv and cross_validate produce the same results?

xgb.cv and sklearn.model_selection.cross_validate do not produce the same mean train/test error even though I set the same seed/random_state and I make sure both methods use the same folds. The code at the bottom allows to reproduce my issue. (Early…
Maauss
  • 11
  • 3