I have a problem with handling JSON data from different sources. So, my plan was to use JSON-LD, and store the data from a source in RDF so that I can do some analysis work on them. But I don't know how to turn a regular JSON in a JSON-LD correctly. For example, I don't know how to get the correct context for the JSON-LD object.
In my project, each of the sources contain information about the infra configs. This information can be extracted in JSON format, but each source has a different structure.
In the following example you can see how I try to use "pyld" and "rdflib" to turn a JSON object in a Graph, but you can see that the output is not as expected:
Side note, my question was reported as spam by stackoverflow when I used an URL as IRI, even the example URL used most examples. So if you want to run this example you have to replace the <unique_iri> for a real URL to make it work.
Example
import json
from pyld import jsonld
from rdflib import Graph
# JSON data
nodes = [
{
"sysid": "vm_remote",
"type": "vm",
"name": "remote",
"config": {
"id": "worker_1",
"cpu": "2",
},
"connect": [
"db_users"
]
},
{
"sysid": "db_users",
"type": "db",
"name": "users",
"config": {
"id": "database_1",
"location": "eu_west",
}
}
]
# Define the context for the JSON-LD object
context = {
"@version": 1.1,
"@base": "<unique_iri>/team_name/",
"@vocab": "<unique_iri>/resources/onprem/",
"sysid": "@id",
"type": "@type",
"config": {
"@id": "config",
"@context": {
"@base": "<unique_iri>/team_name/config/"
}
},
"connect": {"@id": "relation#connect", "@type": "@id", "@container": "@set"}
}
doc = {
"@context": context,
"@graph": nodes,
"@id": "graph",
"@type": "graph"
}
print("\nInput JSON-LD:\n" + json.dumps(doc, indent=2))
expended_data = jsonld.expand(doc)
print("\n\Expanded JSON-LD:\n" + json.dumps(expended_data, indent=2))
graph = Graph().parse(data=json.dumps(expended_data), format='json-ld')
print("\nRDF Graph:\n" + graph.serialize(format='json-ld'))
# Find the type of each entry (a resource)
q = """
PREFIX resources: <<unique_iri>/resources/onprem/>
SELECT DISTINCT ?type
WHERE
{
?s resources:type ?type .
}
"""
print()
for row in graph.query(q):
print("Type: %s" % row)
Output
Input JSON-LD:
{
"@context": {
"@version": 1.1,
"@base": "<unique_iri>/team_name/",
"@vocab": "<unique_iri>/resources/onprem/",
"sysid": "@id",
"type": "@type",
"config": {
"@id": "config",
"@context": {
"@base": "<unique_iri>/team_name/config/"
}
},
"connect": {
"@id": "relation#connect",
"@type": "@id",
"@container": "@set"
}
},
"@graph": [
{
"sysid": "vm_remote",
"type": "vm",
"name": "remote",
"config": {
"id": "worker_1",
"cpu": "2"
},
"connect": [
"db_users"
]
},
{
"sysid": "db_users",
"type": "db",
"name": "users",
"config": {
"id": "database_1",
"location": "eu_west"
}
}
],
"@id": "graph",
"@type": "graph"
}
\Expanded JSON-LD:
[
{
"@graph": [
{
"<unique_iri>/resources/onprem/config": [
{
"<unique_iri>/resources/onprem/cpu": [
{
"@value": "2"
}
],
"<unique_iri>/resources/onprem/id": [
{
"@value": "worker_1"
}
]
}
],
"<unique_iri>/resources/onprem/relation#connect": [
{
"@id": "db_users"
}
],
"<unique_iri>/resources/onprem/name": [
{
"@value": "remote"
}
],
"@id": "vm_remote",
"@type": [
"<unique_iri>/resources/onprem/vm"
]
},
{
"<unique_iri>/resources/onprem/config": [
{
"<unique_iri>/resources/onprem/id": [
{
"@value": "database_1"
}
],
"<unique_iri>/resources/onprem/location": [
{
"@value": "eu_west"
}
]
}
],
"<unique_iri>/resources/onprem/name": [
{
"@value": "users"
}
],
"@id": "db_users",
"@type": [
"<unique_iri>/resources/onprem/db"
]
}
],
"@id": "graph",
"@type": [
"<unique_iri>/resources/onprem/graph"
]
}
]
RDF Graph:
[
{
"@id": "file:///C:...",
"@type": [
"<unique_iri>/resources/onprem/graph"
]
}
]
Type: <unique_iri>/resources/onprem/graph
- The graph is missing the nodes and I don't understand what I am doing wrong.
- I am also not sure how to deal with the config nodes. These should be nodes with their own unique identifier since other data sources will be pointing to these too.
- Also, these python libraries give me other results than the online playground tool from json-ld.
Can somebody please help me?