You have sklearn's StratifiedShuffleSplit
to do exactly that. From the docs:
The folds are made by preserving the percentage of samples for each class.
StratifiedShuffleSplit
returns a generator, containing the indices to split your dataframe into train and test. Here's a sample use case, making it clear that the class proportions are indeed preserved in each split:
from sklearn.model_selection import StratifiedShuffleSplit
import seaborn as sns
X = np.random.randint(0,5,(1200,2))
y = np.random.choice([0,1],size=(1200,), p=[0.8,0.2])
sss = StratifiedShuffleSplit(n_splits=2, test_size=0.2, random_state=0)
train_index, test_index = next(sss.split(X, y))
fig, axes = plt.subplots(1,2, figsize=(10,5))
for split, title, ax in zip([train_index, test_index],
['Train split', 'Test split'],
axes.flatten()):
sns.countplot(y[split], ax=ax).set_title(title)
