I have parsed JSON+LD (structured) data from the tool Screaming Frog. The format this tool exports the data in is unworkable as the parent/child relationship (cross-reference) is not on one row in Excel. Edit: this serialized format is called n-triples. Below is an example output with the index relationships colour-coded (sorry not allowed to post images yet):
https://i.stack.imgur.com/7Zzp3.jpg
Subject Predicate Object
subject27 schema.org/aggregateRating subject28
subject27 schema.org/offers subject29
subject27 schema.org/operatingSystem ANDROID
subject27 type schema.org/SoftwareApplication
subject28 schema.org/ratingCount 15559
subject28 schema.org/ratingValue 3.597853422
subject28 type schema.org/AggregateRating
subject29 schema.org/price 0
subject29 type schema.org/Offer
Below would be an example of the final output required where all nested levels are in it's own column. Each of the nested levels (up to 4 deep) should be mapped into its own column, repeating the parent path information.
Predicate L1 Object L1 Predicate L2 Object L2
type schema.org/SoftwareApplication
schema.org/operatingSystem ANDROID
schema.org/aggregateRating subject28 schema.org/ratingCount 15559
schema.org/aggregateRating subject28 schema.org/ratingValue 3.597853422
schema.org/aggregateRating subject28 type schema.org/AggregateRating
schema.org/offers subject29 schema.org/price 0
schema.org/offers subject29 type schema.org/Offer
I have looked for existing unflatten solutions, but these either use the path information stored into a single column (with each "lowest level value" having its own "row") or don't rebuilt the original data based upon indices.
I am looking to do this with a combination of for loops with SQL JOINS, but I feel there must be a more elegant solution. This could be in Python, PHP, JS or SQL or a combination or even adding each "subject" into a MongoDB document and then applying a merge operation on this?
Edit: Updating the title to optimize SEO for this article. The serialized format of this RDF and JSON+LD data I am working with is called N-triples. Read more here: https://medium.com/wallscope/understanding-linked-data-formats-rdf-xml-vs-turtle-vs-n-triples-eb931dbe9827