1

currently im working with Neo4j + Neo4j Spatial and i would like to ask you if there is a way to import shapefiles into the database in batches, the same way they do with OSM files, the reason for this is because im dealing with a huge dataset (100GB+) and the layer+index is definitly slowing me down when inserting hundreds of thousands of geometries, while using the standard ShapeFileImporter class

My question is: Is there a way to import shapefiles in batches and after the insertion, we do the database.reIndex() the same way we do with .osm files??

Im using neo4j 2.1.2 and neo4j-spatial 0.13

ps.: I also tried setting my GraphDatabaseService with the following:

.setConfig(GraphDatabaseSettings.node_auto_indexing, "false")
.setConfig(GraphDatabaseSettings.relationship_auto_indexing,"false")

but it seems that the ShapeFileImporter creates and uses them anyway.

Marcin Nabiałek
  • 109,655
  • 42
  • 258
  • 291
catacavaco
  • 55
  • 1
  • 1
  • 10
  • Taking a look at the source code from OSMLayer and OSMImporter looks like my problems would be gone if I could either add a node to a layer without indexing it (or maybe adding it to something that is not a layer) or if its possible to merge two previous distinct layers into a single one, any ideas ? – catacavaco Jul 24 '14 at 03:12

1 Answers1

1

If you are using the versions of Neo4j and Neo4j-Spatial that you mentioned, the ShapeFileImporter class doesn't create any indexes (not in the Neo4j sense). For each shape in the .shp file, it extracts all of the properties associated with it (not just the geometry), creates a node, and adds it to the RTree for the layer. Source code for all of this is found at

ShapeFileImporter.java
EditableLayerImpl.java
DefaultLayer.java
RTreeIndex.java

It can be confusing when reading the code, but the member named index is not an index in the Neo4j sense, it is an RTree graph wrapped by the Java code.

The OSM importer does the same work (and more), just split up slightly differently. Neither importer creates legacy indexes, as far as I am aware. The OSM importer creates all the nodes (data and geometry separately with relationships), then builds the RTree from each geometry node. The SHP importer is simpler. It creates nodes that combine the data and geometry, and adds each node to the RTree as it is created. I don't believe there is any overall speed improvement for one against the other.

Jim Biard
  • 2,252
  • 11
  • 14
  • Im currently writing a new importer based on the ShapeFileImporter class to add all nodes and after all the insertions i create the layer and add the nodes to the RTree graph, since adding every node to the Rtree after its creation was slowing me down badly. thanks for the help, funny that i thought almost the same solution and i'll let you know if it goes smoothly – catacavaco Jul 25 '14 at 17:27