0

I have a dataframe that has Latitude, Longitude, and a unique_id for each row.

df = df[['unique_id','Latitude','Longitude']]

I am using stats.binned_statistic_2d to create bins that lat/long will correspond with.

stats, yedges, xedges, binnumbers = stats.binned_statistic_2d(
    np.array(df['Latitude']),
    np.array(
        df['Longitude']),
        bins=bins,
        values=np.array(df['unique_id']),
        statistic = 'count'
    )
)

But after applying this, I would like to produce another column in the dataframe that shows which bin number the unique_id is in, where df will now have df[['unique_id','Latitude','Longitude','binnumber']].

Is there an easy pythonic way to do this rather than nesting loops across xedges and yedges?

I already tried nesting loops, but that is way too slow and I am sure there is an easier pythonic way to do this.

TIA!

Buddyshot
  • 1,614
  • 1
  • 17
  • 44
jrange27
  • 1
  • 1
  • `binnumbers` is already what you are looking for, isn't it? From the docs of [scipy.stats.binned_statistic_2d](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_2d.html): `binnumber`: "This assigns to each element of sample an integer that represents the bin in which this observation falls. The representation depends on the expand_binnumbers argument. See Notes for details." I think you have to parse `np.array(df['unique_id'])` to the `values` argument of `scipy.stats.binned_statistic_2d`. Then, you just have to append `binnumbers` to your data frame. – andthum Feb 02 '23 at 12:47
  • @andthum ok, so the values that are returned in binnumbers are already in the order of the original df? therefore all I have to do is add that list as a new column in the original df. If so, that is what I was looking for. Thanks for clarifying! – jrange27 Feb 02 '23 at 16:00
  • Yes, I think so. But you should check if the result from `scipy.stats.binned_statistic_2d` is the same as your nested loop approach. – andthum Feb 04 '23 at 10:55

0 Answers0