1

For a project that I am working on, I created a linear regression model. After fitting that line, I wanted to simulate the data over and over again using np.random.choice on my data to see the variability in the regression line say the data be recollected. However I keep getting a KeyError: in my function and I am not sure how to fix it.

Here is a head of what the data looks like:

enter image description here

I ran a linear regression model on the columns 'nsb' and 'r'. Here are my functions that repeatedly creates linear regression models for 'bootstrapped' data:

enter image description here

When I call this:

slope, int = draw_bs_pairs_linreg(big_df['nsb'], big_df['r'], size = 1000)

I get this error, which each time I run it the length and values in the list of numbers changes each time I run it.

KeyError: '[2, 567, 459, 458, 355, 230, 353, 565, 231, 566, 117] not in index'

Any help would be appriciated.

Jensen_ray
  • 81
  • 2
  • 10
  • Why didn't you provide the **FULL** error message? Do you want help or not? – hpaulj Feb 27 '22 at 21:17
  • from the picture of the big_df, I can already tell that the index 2 does not exist as it starts at 3 (assuming it is sorted). A quick try would in your function `draw_bs_pairs_linreg` change the first line of code by `inds=x.index` (not sure it is enough) – Ben.T Feb 27 '22 at 21:24
  • @hpaulj the error message is extremely long, but it starts at the line right after I start the for loop in the function "bs_x, bs_y = ...' – Jensen_ray Feb 27 '22 at 21:46
  • We don't usually care if the traceback is long, just so long as it helps identify the problem. In `x[bs_inds]`, `x` is a pandas Series, with its own index, which in this case is not contiguous. But you have created `bs_inds` as though you were indexing a numpy array. It might have run if you used `big_df['nsb'].values`, the array of values. And probably be faster. The traceback should tell us (and you) that pandas has problems with the index array! Do not blow off a request for a full traceback! – hpaulj Feb 28 '22 at 01:55

1 Answers1

2

You need DataFrame.reset_index before call your function

big_df = big_df.reset_index(drop=True) 

Or indexing with .iloc

bs_x, bs_y = x.iloc[bs_inds], y.iloc[bs_inds]
ansev
  • 30,322
  • 5
  • 17
  • 31