Handling empty array types when using APOC to import a CSV to neo4j

Question

I have a csv file wherein some fields are array-types. Fields are separated with , and array items are separated with ;. For example:

index, name, friends, neighbors
0,Jim,John;Tim;Fred,Susan;Megan;Cheryl
1,Susan,Jim;John,Megan;Cheryl
2,Sean,,,

where Jim has three friends, John, Tim, and Fred, and three neighbors, Susan, Megan, and Cheryl, and Sean has no friends and no neighbors.

However, when I read this into neo4j using apoc.load.csv, I end up with list properties with empty strings inside of them (rather than empty lists). For example:

CALL apoc.periodic.iterate("
CALL apoc.load.csv('file.csv',
    {header:true,sep:',',
    mapping:{
        friends:{array:true},
        neighbors:{array:true}}
    })
YIELD map as row RETURN row
","
CREATE (p:Person) SET p = row
", 
{batchsize:50000, iterateList:true, parallel:true});

Gives me a Person with name Sean but with friends=[ "" ] and neighbors=[ "" ].

What I want is Sean to have friends=[] and neighbors=[].

Thank you!

In your current CSV data, Jim actually has 3 friends. – cybersam Jul 13 '18 at 00:55 — cybersam, Jul 13 '18 at 00:55
Fixed! Thank you @cybersam – Arthur D. Jul 13 '18 at 02:00 — Arthur D., Jul 13 '18 at 02:00

cybersam · Accepted Answer · 2018-07-13T02:32:51.313

Make sure there are no extraneous spaces in your CSV file header (or else some property names will start with a space):
```
index,name,friends,neighbors
0,Jim,John;Tim;Fred,Susan;Megan;Cheryl
1,Susan,Jim;John,Megan;Cheryl
2,Sean,,,
```

Use list comprehension to help eliminate all friends and neighbors elements that are empty strings:

CALL apoc.periodic.iterate(
  "CALL apoc.load.csv(
     'file.csv',
     {
       header:true, sep:',',
       mapping: {
         friends: {array: true},
         neighbors: {array: true}
       }
     }) YIELD map
   RETURN map
  ",
  "CREATE (p:Person)
   SET p = map
   SET p.friends = [f IN p.friends WHERE f <> '']
   SET p.neighbors = [n IN p.neighbors WHERE n <> '']
  ", 
  {batchsize:50000, iterateList:true, parallel:true}
);

With the above changes, this query:

MATCH (person:Person) RETURN person;

returns this result:

╒══════════════════════════════════════════════════════════════════════╕
│"person"                                                              │
╞══════════════════════════════════════════════════════════════════════╡
│{"name":"Jim","index":"0","neighbors":["Susan","Megan","Cheryl"],"frie│
│nds":["John","Tim","Fred"]}                                           │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Susan","index":"1","neighbors":["Megan","Cheryl"],"friends":[│
│"Jim","John"]}                                                        │
├──────────────────────────────────────────────────────────────────────┤
│{"name":"Sean","index":"2","neighbors":[],"friends":[]}               │
└──────────────────────────────────────────────────────────────────────┘

[UPDATED]

Also, if it is not possible for your CSV file to contain an "empty" friend or neighbor substring (e.g., John;;Fred), then this version of the query that uses CASE instead of list comprehension would be more efficient:

CALL apoc.periodic.iterate(
  "CALL apoc.load.csv(
     'file.csv',
     {
       header:true, sep:',',
       mapping: {
         friends: {array: true},
         neighbors: {array: true, arraySep:';'}
       }
     }) YIELD map
   RETURN map
  ",
  "CREATE (p:Person)
     SET p = map
     SET p.friends = CASE p.friends WHEN [''] THEN [] ELSE p.friends END
     SET p.neighbors = CASE p.neighbors WHEN [''] THEN [] ELSE p.neighbors END
  ", 
  {batchsize:50000, iterateList:true, parallel:true}
);

Works! Thank you, @cybersam. Just to clarify, we don't necessarily need to add `arraySep:';'`, since [that is the default](https://neo4j-contrib.github.io/neo4j-apoc-procedures/#_configuration_options), correct? (I know that ` header:true` and `sep:','` are also the defaults). — Arthur D., Jul 13 '18 at 02:15
One final question: I now want to create separate nodes for each neighbor, and a `NEIGHBORS` relationship for each neighbor pointing to the person in question. How would I add this to the query? @cybersam — Arthur D., Jul 18 '18 at 23:03
To create a person's neighbor nodes (if not already existent) and a `NEIGHBOR_OF` relationship from each neighbor back to that person (if not already existent): `MATCH (p:Person) FOREACH(n IN p.neighbors | MERGE (m:Person {name: n}) MERGE (m)-[:NEIGHBOR_OF]->(p)`. Newly created neighbor nodes will not have an `index` property. — cybersam, Jul 19 '18 at 02:27
Thanks @cybersam! I didn't think of doing this outside the original import script. Two follow-ups: (1) Are you missing a `)` at the end of that query? (2) What does it mean for neighbor nodes to not have an `index` property, and is that something I need to rectify? — Arthur D., Jul 19 '18 at 18:08
Yes, query is missing ending `)`. Your original query's `SET` clause copies the `index` property (from the csv file's `index` column) to every `Person` it created. I was just pointing out that the query in my comment would not add an `index` property to any `Person` nodes that it creates. — cybersam, Jul 19 '18 at 20:07

Handling empty array types when using APOC to import a CSV to neo4j

1 Answers1