-1

Could you share a sample code to convert Wikidata dumps to Gremlin format, please?

I would like to load the converted Gremlin CSV data into AWS Neptune.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
MAK
  • 1,915
  • 4
  • 20
  • 44

1 Answers1

1

As discussed in your other question, Amazon Neptune will happily load that RDF format data directly, but you would need to query it using SPARQL. Unless you absolutely need to get the data into property graph format, loading the data as-is and using SPARQL would get you up and running very quickly.

To use Gremlin or openCypher that data will need to be converted to an equivalent property graph form. You really have a couple of options:

  1. Convert the RDF format data into equivalent CSV file format so that the Neptune bulk loader can load it for you.
  2. Convert the RDF format data into Gremlin addV and addE steps, or openCypher CREATE and MERGE clauses.

If you have a lot of data to load, the CSV files and bulk loader will be the easier route.

Converting from RDF format to property graph format is very easy. I'm aware of tools that go the other way (CSV to RDF) but not of one that will take a TTL file (let's say) and turn that into CSV.

If you are comfortable writing a little code, all you really need is a Python or Ruby script, then converting this data is quite straightforward. You just have to convert the triple patterns into nodes and edges (with properties).

So, imagine in the RDF you have triples that are essentially in this form

max a dog 
fido a dog 
max age 3 
fido age 6 
max likes fido

You would end up creating two nodes, two properties and an edge.

In CSV form the nodes would like like

~id, ~label, age
max,dog,3
fido,dog,6

and the edge would be

~id,~label,~from,~to
e1,likes,max,fido

If you plan on converting all the data, and it is just too much for a script based approach, using a big data ETL approach, such as Spark, is likely the way to go. Many ways to approach this. Not a super hard task. I'm just not aware of a tool that will do it for you (there may be one, but I'm just not aware of anything).

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38
  • Thank you. I have a feeling that Gremlin is more easier for complex queries than SPARQL. This is why I wanted to convert RDF to Property Graph. Is that true? Please advise. – MAK Sep 28 '22 at 07:34
  • That's a bit of an "it depends" question. Over this specific data, SPARQL is likely a good choice as the data itself is a little unusual in the way that it is constructed. The key area where Gremlin makes life easier is path finding. It's much easier in Gremlin to answer the question "how do I get from A to Z and what places did I visit along the way?" Both Gremlin and SPARQL can be used to answer a lot of graph questions. For this specific case, I would think about the type of queries you want to run and decide if the effort to convert the data is worth it. – Kelvin Lawrence Sep 28 '22 at 13:41
  • I would like to run incremental filter queries like this - https://stackoverflow.com/q/73877378/9905102 – MAK Sep 29 '22 at 11:29