0

I saw a lot of tutorials about how to load csv (Gremlin) data in the format of vertices and edges into AWS Neptune. For a lot of reasons, I cannot create vertices and edges for data loading. Instead I have just the raw csv file where each row is a record (e.g. a person).

How can I create nodes and relationships from each row of record from the raw csv in Neptune from the notebook interface?

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
Gene Xu
  • 609
  • 1
  • 8
  • 18
  • 1
    The easiest way most likely is to write a few lines of Python that reads the CSV and generates Gremlin or openCypher to create the nodes. Can you provide a sample of the raw CSV? The alternative is to convert the CSV to have the headers the bulk loader would expect. You could even run the CSV-gremlin tool over that. – Kelvin Lawrence Apr 12 '22 at 12:25
  • @Kelvin thanks for the comments. The reason why we cannot use vertices-edges Gremlin bulk loader is that, we'd like to realize realtime ingestion. Imaging we have new contents added every minute in our platform where they need to be converted to graph nodes. We don't think bulk loading would work. Can you give some examples on converting row-by-row records (with header) by opencypher? – Gene Xu Apr 12 '22 at 14:03
  • Can you share a sample of how the CSV data will look? – Kelvin Lawrence Apr 12 '22 at 16:11
  • Data example is not important. Let's use the Movies database from Neo4J for example. There are actors.csv and movies.csv. Assuming I can only stream 10 rows a time from both csv files, how to build graph DB in Neptune? – Gene Xu Apr 12 '22 at 17:51

1 Answers1

2

Given you mentioned wanting to do this in the notebooks, the examples below are all run from inside a Jupyter notebook. I don't have the data sets you mentioned to hand, so let's make a simple one in a Notebook cell using.

%%bash
echo "code,city,region
AUS,Austin,US-TX
JFK,New York,US-NY" > test.csv

We can then generate the openCypher CREATE steps for the nodes contained in that CSV file using a simple cell such as:

import csv
with open('test.csv', newline='') as csvfile:
    reader = csv.DictReader(csvfile, escapechar="\\")
    query = ""
    for row in reader:
        s = "CREATE (:Airport {"
        for k in row:
            s += f'{k}:"{row[k]}", '
        s = s[:-2] + '})\n'
        query += s 
    print(query)

Which yields

CREATE (:Airport {code:"AUS", city:"Austin", region:"US-TX"})
CREATE (:Airport {code:"JFK", city:"New York", region:"US-NY"})

Finally let's have the notebook oc cell magic run that query for us

ipython = get_ipython()
magic = ipython.run_cell_magic
magic(magic_name = "oc", line='', cell=query)

To verify that the query worked

%%oc
MATCH (a:Airport)
RETURN a.code, a.city

which returns:

    a.code     a.city
1   AUS        Austin
2   JFK        New York

There are many ways you could do this, but this is a simple way if you want to stay inside the notebooks. Given your question does not have a lot of detail or an example of what you have tried so far, hopefully this gives you some pointers.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38