Hdf5 and spatial indexes

Question

I have a large dataset, 11 million rows, and I loaded the data into pandas. I want to then build a spatial index, like rtree or quad tree, but when I push it into memory, it consumes a ton of RAM along with the already reading the large file.

To help reduce the memory footprint, I was thinking of trying to push the index to disk. Can you store the tree in a table? Or even a dataframe and store it in hdf table? Is there a better strategy?

Thanks

This question is a bit off-topic. I am fairly certain mysql can handle storing and retrieving trees. — James, Apr 15 '17 at 02:30
I am not sure what this question means but what about reading the dataset in batches in pandas? — Peaceful, Apr 15 '17 at 04:03
@peaceful I'm trying to ask if I have a really large dataset, and I want to not but an rtree index into memory, is there a strategy to do this, or an existing package? — JabberJabber, Apr 15 '17 at 10:56
Openstreet has a number of tools for dealing with spatial data, check out the wiki (http://wiki.openstreetmap.org/wiki/Downloading_data), it links to various tools (Osmosis, osmconvert, osmfilter, ...). — TilmannZ, Apr 16 '17 at 10:26

score 0 · Answer 1 · answered Apr 15 '17 at 07:55

0

Yes, r-trees can be stored on disk easily. (It's much harder with KD-trees and quad-trees)

That is why the index is block oriented - the block size is meant to be chosen to match hour drive.

I don't use pandas, and will not give a library recommendation.

answered Apr 15 '17 at 07:55

Has QUIT--Anony-Mousse

76,138
12
138
194

Thanks for the comment – JabberJabber Apr 15 '17 at 10:55
Basic R-Trees are not that good, how about R*Tree (RStarTree), X-Tree or STR-Tree (sort-tile-recursive loaded R-Tree)? – TilmannZ Apr 16 '17 at 10:20
All of them are r-trees, just some variation, but all of them are designed for disk usage. – Has QUIT--Anony-Mousse Apr 16 '17 at 21:41

Hdf5 and spatial indexes

1 Answers1