I've dealt with the tabular-data that has approximately more than one million rows in the data and it only contains just one column.
I tried to use bootstrap method also known as traditional sampling methods with replacement.
Since bootstrap method is just to sample the value in the population with replacement, I made the code as below in a simple way.
public static double[] inelegantSampleWithReplacement(double []someArray,int howmany){
double result[] = new double[NUMBER_OF_ROWS];
for(int i=0;i<howmany;++i){
result[i] = someArray[(int)(someArray.length * Math.random())];
}
return result;
}
It works well and fortunately it takes not too much time for the data with one million rows. It took one minute for a matrix with one million rows.
I am looking for the sampling methods that make the code more faster since I will be faced with big data where billions of rows easily appear.
As you can see, sampling with replacement is a very straight-forward method and I made the code as above. I tried to search for other sophisticated version of bootstrap and found the blog (http://www.inquidia.com/news-and-info/solution-bootstrapping-big-data-environments-how-sample-replacement-using-sampling). I made the codes by following the blog, but results was worse than the above code.
Do you have any great idea of enhancing the running time of above bootstrap method?