There is a very nice guide on optimizing linux machine for Neo4j. But this guide assumes the typical characteristics of a physical hard drive. I am running my Neo4j instances on Google CE and Amazon EC2. I am unable to find any document detailing an optimal setup for these virtual machines. What resources do I need in terms of memory (for heap or extended use) and disk speed / IOPS to get an optimal performance? I currently have a couple of million nodes and about ten million relationships (2 GBs) and the data size is increasing with imports.
On EC2 I used to rely on SSD scratch disks and then make regular backups to permanent disks. There is no such thing available on Compute Engines, and the write speeds don't seem very high to me, at least at normal disk sizes (because speed changes with size). Is there any way to get a reasonable performance on my import/index operations? Or maybe these operations have more to do with memory and compute capacities?
Any additional reading is welcome...