I have a large number of latitude and longitude values that I would like to bin together in order to display them on a heatmap (ipyleaflet
only seems to allow 2000 or so points in the heatmap and this would also be much more efficient when using big data).
I am actually using vaex
, but a answer in terms of pandas
is also fine.
The pandas
pd.cut
function seems to be helpful in terms of binning, however it produces a categorical column (category
dtype
) which looks like a list of all the values in the bin. Is there some way of changing this to just be an incremental number identifying each bin (thanks jezreal for that part of the answer)? I just need a bin number in order to then groupby
the bin number and average (mean
) on the latitude and longitude columns. I also need a count for the intensity of the heatmap entry.
For example:
dft = pd.DataFrame({
'latitude': [1.5, 0.5, 1.2, 0.9, 3],
'longitude': [3, 0.2, 2, 0.2, 1.1]
})
dft['bin'] = pd.cut(dft['latitude'], bins=3, labels=False).astype(str) + "_" + pd.cut(dft['longitude'], bins=3, labels=False).astype(str)
dft.groupby('bin').agg(['mean', 'count']).unstack()
Almost gives me the answer, but I think I want this output instead:
bin latitude_mean longitude_mean count
0_0 0.7 0.2 2
0_1 1.2 2.0 1
1_2 1.5 3.0 1
2_0 3.0 1.1 1
It would be helpful if the count can be normalized between 1 and 1000.
How can I use pandas
pd.cut
or something else to groupby
the bin in the rows, average on the latitude and longitude and (heatmap intensity) count in the columns?