I have an imbalanced dataset for sentiment analysis with about 65000 observations (~60000 positive and ~5000 negatives). This dataset should be balanced so that I have the same number of positive and negative observations to train my machine learning algorithms.
The package caret
and the function downSample
help me to get ~5000 negative and ~5000 positive observations (downsampling to minority class). But I like to have exactly 2500 randomly selected positive and 2500 randomly selected negative observations. Is there anyone who knows how to do this?