What you would be measuring then is the performance of the lucene index.
So not a graph-database operation.
There are a number of options:
neo4j-import
Neo4j 2.2.0-M03 comes with neo4j-import, a tool that can quickly and scalable import a 1 billion node csv into Neo4j.
parallel-batch-importer API
this is very new in Neo4j 2.2
I created a node-only Graph with 1.000.000.000 nodes in 5mins 13s (53G db) with the new ParallelBatchImporter. Which makes it about 3.2M nodes/second.
Code is here: https://gist.github.com/jexp/0ff850ab2ce41c9ca5e6
batch-inserter
You could use the Neo4j Batch-Inserter-API to create that data without creating the CSV first.
see this example here which you would have to adopt to not read CSV but generate the data directly from a for loop: http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/
Cypher
If you want to use Cypher I'd recommend to run something like this in the JAVA_OPTS="-Xmx4G -Xms4G" bin/neo4j-shell -path billion.db
:
Here is the code and timings for 10M and 100M I took on my macbook:
create a csv file with 1M lines
ruby -e 'File.open("million.csv","w")
{ |f| (1..1000000).each{|i| f.write(i.to_s + "\n") } }'
Experiment running on a MacBook Pro
Cypher execution is single threaded
estimated size (15+42) bytes * node count
// on my laptop
// 10M nodes, 1 property, 1 label each in 98228 ms (98s) taking 580 MB on disk
using periodic commit 10000
load csv from "file:million.csv" as row
//with row limit 5000
foreach (x in range(0,9) | create (:Person {id:toInt(row[0])*10+x}));
// on my laptop
// 100M nodes, 1 property, 1 label each in 1684411 ms (28 mins) taking 6 GB on disk
using periodic commit 1000
load csv from "file:million.csv" as row
foreach (x in range(0,99) | create (:Person {id:toInt(row[0])*100+x}));
// on my linux server
// 1B nodes, 1 property, 1 label each in 10588883 ms (176 min) taking 63 GB on disk
using periodic commit 1000
load csv from "file:million.csv" as row
foreach (x in range(0,999) | create (:Person {id:toInt(row[0])*100+x}));
creating indexes
create index on :Person(id);
schema await
// took about 40 mins and increased the database size to 85 GB
then I can run
match (:Person {id:8005300}) return count(*);
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row
2 ms