0

Yesterday I began to read about using bootstrapping to determine confidence intervals (CIs) in many situations. My current situation is that I am trying to estimate three parameters in a model via maximum likelihood estimation (MLE). This I have done, and now I need to define my CIs. This can obviously be done via profile likelihood, but bootstrapping will give a more broad CI interval as far as I can read. My problem is that I am unsure on how to actually perform bootstrapping ? I have written my own code for the parameter estimation, so no build-in MLE calculators.

Basically the observed data I have is binary data, so 1 or 0. And it's from those data (put into a model with three parameters) that I have tried to estimate the parameter values.

So let's say my cohort is 500, is the idea then that I take a sample from my cohort, maybe 100, and then expand it to 500 again by just multiplying the sample 5 times, and run the simulation once again, which in turn should result in some new parameter estimates, and then just do this 1000-2000 times in order to get a series of parameter values, which can then be used to define the CI ?

Or am I missing something here ?

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Denver Dang
  • 2,433
  • 3
  • 38
  • 68
  • 1
    Bootstrapping is usually done by sampling the whole set *with replacement* such that your bootstrapped sample is the same size as the original. – PMende Aug 30 '18 at 20:26
  • So I just take my entire cohort of 500, find all the 1 values, and then just randomly switch them around, and re-calculate ? – Denver Dang Aug 30 '18 at 20:27
  • No, sampling with replacement is not permutation, because some values in the data will not be drawn, and others will be drawn several times. – Denziloe Aug 30 '18 at 20:29
  • But if I use the entire cohort every time I calculate the new parameter estimation, won't all new "switched around" values be used/drawn ? – Denver Dang Aug 30 '18 at 20:32
  • You don't use the entire cohort (if by that you mean your original data sample), you *sample* that sample data *with replacement*. If your original data is [person1, person2, person3], random sampling with replacement can give you something like [person2, person2, person1]. – Denziloe Aug 30 '18 at 20:34
  • So if I have 500 "persons", which each have some kind of data associated with it, in particular the observed binary data (1, 0), the idea is to take a sample from the entire cohort (500), maybe the same size, maybe a smaller size, and then take a random "person" from the original sample, and put it into the new sample until it's filled ? – Denver Dang Aug 30 '18 at 21:25
  • 1
    I don't know what "take a sample from the entire cohort and then take a random person" means. That's not what you do. You have 500 persons (using persons as an informal example, really we mean "rows" or "records"; the rows might only have one variable; each person has an observed binary variable in your case yes). You take a random sample with replacement of 500 people from the overall 500 people. This is called a bootstrap sample. You calculate the statistic. You take another random sample with replacement of 500 people from your original 500 people. You calculate the statistic. Et cetera. – Denziloe Aug 30 '18 at 22:07

1 Answers1

4

This question isn't related to Python. I think you need to read an intro to bootstrapping. "An Introduciton to Statistical Learning" provides a good one. The idea is not to sample 100 -- you must sample with replacement and taking the same sample size (500). Yes, then you reestimate your parameter many times. And then there's several ways of taking all of these estimates and turning them into a confidence interval. For example, you can use them to estimate the standard error (the standard deviation of the sampling distribution), and then use +/- 2*se.

Denziloe
  • 7,473
  • 3
  • 24
  • 34
  • True, the reason for python tag was more if there was anything smart in python that could just do it. But it seems like there is no need for that. – Denver Dang Aug 30 '18 at 20:28