I am new to R, I recently used stratified sampling for train and test split to ensure target label is in equal proportion for both now I want to use down-sample the training data such that the population distribution/ train distribution is similar to the new down-sample distribution.
The reason I want to down-sample is because I have 11 Million rows with 56 columns and it will take days to do parameter tuning via grid/random/Bayesian search
I am using XGboost and it's is a binary classification problem
I would really appreciate if someone can help me on this.
Below is my code
train_rows = sample.split(df$ModelLabel, SplitRatio=0.7) ## Stratiefied sampling
train = df[ train_rows,]
test = df[!train_rows,]`enter code here`