0

I'm very new to the Neo4j world so please forgive me if this is a trivial question. I have 2 tables I've loaded into the database using LOAD CSV

artists:

artist_name,artist_id
"Bob","abc"
"Jack","def"
"James","ghi"
"Someone","jkl"
"John","mno"

agency_list:

"Agency"
"A"
"B"
"C"
"D"

Finally, I have an intermediary table that has the artist and the agencies that represent them.

artist_agencies:

artist_name,artist_id,agency
"Bob","abc", "A"
"Bob","abc", "B"
"Jack","def", "C"
"James","ghi", "C"
"Someone","jkl","B"
"Someone","jkl", "C"
"John","mno", "D"

Notice some artists can be a part of multiple agencies (which is why I didn't include the agency variable in the Artist table)

I'm trying to get four agency nodes that connect to each artist based on a :REPRESENTS relationship. Basically something like: (agency:Agency) - [:REPRESENTS] -> (artist:Artist)

The code I've tried is:

LOAD CSV WITH HEADERS FROM "file:///agency_list.csv" as agencies
CREATE (agency:Agency {agency: agencies.Agency})

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artists.csv" as artists
CREATE (artist:Artist {artist: artists.artist_name, artist_id: artists.artist_id})

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line 
CREATE (ag:Agency) - [:REPRESENTS] -> (ar:Artist {track_artist_uri:line.track_artist_uri})

So far I'm getting this, each blue node is a duplicate of an agency name. Rather than just having one single agency node that connects to all artists via the :REPRESENTS relationship. result

I guess my problem is that I don't know how to relate the artists table to the agency_list table via this intermediate artist_agencies table. Is there a better way to do this or am I on the right track?

Thanks! Joey

1 Answers1

0

The artist_agencies.csv query needs to find the appropriate Agency and Artist nodes before creating a relationship between them. For example:

USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///artist_agencies.csv" as line
MATCH (ag:Agency) WHERE ag.agency = line.agency
MATCH (ar:Artist) WHERE ar.artist_id = line.artist_id
CREATE (ag)-[:REPRESENTS]->(ar)

Aside: The artist_agencies.csv file does not need the artist_name column.

[UPDATE]

If the artist_agencies.csv data could cause duplicate relationships to be created, replace CREATE with (the more expensive) MERGE to avoid that. And make sure you do not have duplicate Agency or Artist nodes.

cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Thanks! Definitely appreciate the answer and understand what you're suggesting here. I need to not only match up the artists to artists but the agencies to agencies as well. Problem is when I ran this I got 28mil relationships created which is _way_ over what it should be. Is there some sort of deduping I should be doing here? – joeynichols Nov 08 '19 at 19:56
  • See my updated answer. Also, make sure you do not have duplicate `Agency` or `Artist` nodes. – cybersam Nov 08 '19 at 21:49
  • 1
    Also you'll want to make sure you have indexes on :Agency(agency) and :Artist(artist_id) for fast lookup of the nodes. – InverseFalcon Nov 08 '19 at 22:24
  • Noted. Updated using MERGE instead of CREATE and that did away with the duplicates. We're in business now, thank you cybersam for your help and InverseFalcon for the tip! – joeynichols Nov 08 '19 at 23:02