I'm trying to develop a map-based visualization which includes a "heat map" of subpopulations, based on a MongoDB collection that contains documents like this:
{
"PlaceName" : "Boston",
"Location" : {
"type" : "Point",
"coordinates" : [ 42.358056, -71.063611 ]
},
"Subpopulations": {
"Age": {
"0_4" : 37122,
"6_11" : 33167,
"12_17" : 35464,
"18_24" : 130885,
"25_34" : 127058,
"34_44" : 79092,
"45_54" : 72076,
"55_64" : 59766,
"65_74" : 33997,
"75_84" : 20219,
"85_" : 9057
}
}
}
There are hundreds of thousands of individual locations in the database. They do not overlap -- i.e. there wouldn't be two individual entries for "New York City" and "Manhattan".
The goal is to use Leaflet.js and some plugins to render various visualizations of this data. Leaflet's quite good at clustering data client-side -- so if I passed it a thousand locations with density values, it could render a heat map of the relevant area just by crunching all the individual values.
The problem is, say I zoom out in the map to show the whole world. It would be horribly inefficient, if not impossible, to send all that data to the client and have it process that info quickly enough to make for a smooth visualization.
So what I need to do is automatically cluster the data server-side, which I'm hoping can be done in a MongoDB query. I've read that geohashing may be a good starting point to determine which points belong in which clusters, but I'm sure someone has done this exact thing before and might have better insight than just that. Ideally I'd like to send off a query to my node.js script that looks like this:
http://myserver.com/popdata?top=42.48&left=-80.57&bottom=37.42&right=-62.55&stat=Age&value=6_11
which would determine how granular the clustering needs to be based on how many individual points are within that specified geographic area, given a maximum number of data points to return, or something along those lines; and it would return the data like this:
[
{ "clusterlocation": [ 42.304, -72.622 ], "total_age_6_11": 59042 },
{ "clusterlocation": [ 36.255, -64.124 ], "total_age_6_11": 7941 },
{ "clusterlocation": [ 40.425, -70.693 ], "total_age_6_11": 90257 },
{ "clusterlocation": [ 39.773, -67.992 ], "total_age_6_11": 102752 },
...
]
...where "clusterlocation" is something like the mean of all locations of documents in the cluster, and "total_age_6_11" is the sum of those documents' values for "Subpopulations.Age.6_11".
Is this something I can do purely in a Mongo query? Is there a "tried and tested" way to do it well?