1

I'm new to Neo4j, Currently I'm trying to make dating site as POC. I have 4GB of Input file which is look like bellow format.

This contains viewerId(male/female), viewedId which is list of id's they have viewed. Based on this history file, I need to give recommendation when any user comes to online.

Input file:

viewerId   viewedId 
12345   123456,23456,987653 
23456   23456,123456,234567 
34567   234567,765678,987653 
:

For this task, I tried the following way,

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
UNWIND viewedIds AS viewedId
MERGE (p2:Persons2 {viewerId: row.viewerId})
MERGE (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2)
MERGE (c2)-[:Sees]->(p2);

And My Cypher query to get result is,

MATCH (p2:Persons2)-[r*1..3]->(c2: Companies2)
RETURN p2,r, COLLECT(DISTINCT c2) as friends 

To complete this task, it will take 3 days.

My system config:

Ubuntu -14.04  
RAM -24GB

Neo4j Config:
neo4j.properties:

neostore.nodestore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=2300M
neostore.propertystore.db.arrays.mapped_memory=5M
neostore.propertystore.db.strings.mapped_memory=3200M
neostore.relationshipstore.db.mapped_memory=800M

neo4j-wrapper.conf

wrapper.java.initmemory=12000
wrapper.java.maxmemory=12000

To reduce time, I search and get one idea in internet like Batch importer from the following link, https://github.com/jexp/batch-import

In that link, they have node.csv, rels.csv files, they imported into Neo4j. I'm not getting any idea about how they are creating node.csv and rels.csv files which scripts they're are using and all.

Can anyone give me sample script to make node.csv and rels.csv files for my data?

Or can you give any suggestions to make import and retrieve data faster?

Thanks in Advance.

Karthick S
  • 25
  • 4

1 Answers1

1

You don't need the inverse relationship, only one is good enough !

For the Import configure your heap (neo4j-wrapper.conf) to 12G, configure page-cache (neo4j.properties) to 10G.

Try this, it should be done in a few minutes.

create constraint on (p:Persons2) assert p.viewerId is unique;
create constraint on (p:Companies2) assert p.viewedId is unique;

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
MERGE (p2:Persons2 {viewerId: row.viewerId});

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
FOREACH (viewedId IN split(row.viewedId, ",") |
  MERGE (c2:Companies2 {viewedId: viewedId}));

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2);

For the relationship-merge if you have some companies which have hundreds of thousands up to millions of views, you might want to use this instead:

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
WHERE shortestPath((p2)-[:Friends]->(c2)) IS NULL
CREATE (p2)-[:Friends]->(c2);

Regarding your query?

What do you want to achieve by retrieving the cross products between all people and all companies up to 3 levels deep? These might be trillions of paths?

Usually you want to know this for a single person or company.

Update Your Query

Eg. For 123456, Persons who are all viewed this company is 12345,23456, then what are the companies these persons viewed 12345 123456,23456,987653 23456 23456,123456,234567 then I need to give recommendation to company -123456 as 23456,987653,23456,234567 Distinct of Result(Final Result) 23456,987653,234567

match (c:Companies2)<-[:Friends]-(p1:Persons2)-[:Friends]->(c2:Companies2)
where c.viewedId = 123456
return distinct c2.viewedId;

for all companies, this might help:

match (c:Companies2)<-[:Friends]-(p1:Persons2)
with p1, collect(c) as companies
match (p1)-[:Friends]->(c2:Companies2)
return c2.viewedId, extract(c in companies | c.viewedId);
Michael Hunger
  • 41,339
  • 3
  • 57
  • 80
  • Hi Michael,Thanks for your reply. I want to give Recommendation for each company like who are all viewed this company also viewed. Eg. For 123456, Persons who are all viewed this company is 12345,23456, then what are the companies these persons viewed 12345 123456,23456,987653 23456 23456,123456,234567 then I need to give recommendation to company -123456 as 23456,987653,23456,234567 Distinct of Result(Final Result) 23456,987653,234567 like that I need to give recommendations for each company. Can you give any suggestions for this? – Karthick S Jun 24 '15 at 06:13
  • Hi Michael, Thanks for your update. I tried your update query to retrieve data. For single Company Id I get result. But If I go for whole data retrieve, I'm getting Heap space error. I increased my heap size upto 24Gb in wrapper.conf file. Still I'm getting the same error. How do I get whole recommended data for each Company Id's? Can you help me to solve this problem? – Karthick S Jul 02 '15 at 11:03
  • did you create the index/constraint on :Companies2(viewedId) ? – Michael Hunger Jul 03 '15 at 06:49
  • Updated a further query example – Michael Hunger Jul 03 '15 at 06:52
  • Hi Michael, Thanks for your updated query. I have created constraint and used what ever you have given. I have totally 15,75,345 Ids (Person, list of Companies viewed). Once I imported this, It was created 32,23,433 nodes and 17,79,75,432 relations. Upto 1,00,000 Ids I got recommendation result Ids. If I go more than that limit like 2 lakhs or more, I didn't get any result. But that job is running more than 4 days and still running. Can you give me some suggestions for this problem? Do I need to change any Neo4j configuration files? – Karthick S Jul 14 '15 at 04:52
  • Perhaps you can send me a copy of your db (on dropbox) and your queries again to michael at neo4j.com to have a look. – Michael Hunger Jul 16 '15 at 14:37