I have a dataset consisting of 21 unique data records. To benchmark the performance certain algorithms like kNN and SVM by increasing the no of samples for each class, I would like to test on data with at least 20 or more unique records for each class (Predict Conc. are the different classes).
I don't want to generate random data. I would like to use the 21 unique data records which I have as the base dataset and generate the remaining data similar to the the existing data.
How can I do this using Python?
Here's the sample data
Index OD600AV Cell Count Predict Conc
1 0.059625 800000 1
2 0.063125 442000 1
3 0.067375 544000 1
4 0.060125 728000 2
5 0.062500 616000 2
6 0.063000 688000 2
7 0.061125 532000 3
8 0.059875 470000 3
9 0.059250 556000 3
10 0.060250 466000 4
11 0.056000 222000 4
12 0.056000 390000 4
13 0.055125 112000 5
14 0.049625 105000 5
15 0.050875 120000 5
16 0.047875 56000 6
17 0.058000 44000 6
18 0.048500 140000 6
19 0.052500 62000 7
20 0.061125 52000 7
21 0.047125 64000 7
This question is quite similar to Generate data by using existing dataset as the base dataset which seems has been answered using R which I could not get to work.
Thanks