1

I have a problem with handling JSON data from different sources. So, my plan was to use JSON-LD, and store the data from a source in RDF so that I can do some analysis work on them. But I don't know how to turn a regular JSON in a JSON-LD correctly. For example, I don't know how to get the correct context for the JSON-LD object.

In my project, each of the sources contain information about the infra configs. This information can be extracted in JSON format, but each source has a different structure.

In the following example you can see how I try to use "pyld" and "rdflib" to turn a JSON object in a Graph, but you can see that the output is not as expected:

Side note, my question was reported as spam by stackoverflow when I used an URL as IRI, even the example URL used most examples. So if you want to run this example you have to replace the <unique_iri> for a real URL to make it work.

Example

import json
from pyld import jsonld
from rdflib import Graph

# JSON data
nodes = [
    {
        "sysid": "vm_remote",
        "type": "vm",
        "name": "remote",
        "config": {
                "id": "worker_1",
                "cpu": "2",
        },
        "connect": [
            "db_users"
        ]
    },
    {
        "sysid": "db_users",
        "type": "db",
        "name": "users",
        "config": {
                "id": "database_1",
                "location": "eu_west",
        }
    }
]

# Define the context for the JSON-LD object
context = {
    "@version": 1.1,
    "@base": "<unique_iri>/team_name/",
    "@vocab": "<unique_iri>/resources/onprem/",
    "sysid": "@id",
    "type": "@type",
    "config": {
      "@id": "config",
      "@context": {
        "@base": "<unique_iri>/team_name/config/"
      }
    },
    "connect": {"@id": "relation#connect", "@type": "@id", "@container": "@set"}
}

doc = {
    "@context": context,
    "@graph": nodes,
    "@id": "graph",
    "@type": "graph"
}

print("\nInput JSON-LD:\n" + json.dumps(doc, indent=2))

expended_data = jsonld.expand(doc)
print("\n\Expanded JSON-LD:\n" + json.dumps(expended_data, indent=2))

graph = Graph().parse(data=json.dumps(expended_data), format='json-ld')
print("\nRDF Graph:\n" + graph.serialize(format='json-ld'))

# Find the type of each entry (a resource)
q = """
    PREFIX resources: <<unique_iri>/resources/onprem/>
    SELECT DISTINCT ?type
    WHERE
    {
        ?s resources:type ?type .
    }
    """
print()
for row in graph.query(q):
    print("Type: %s" % row)

Output

Input JSON-LD:
{
  "@context": {
    "@version": 1.1,
    "@base": "<unique_iri>/team_name/",
    "@vocab": "<unique_iri>/resources/onprem/",
    "sysid": "@id",
    "type": "@type",
    "config": {
      "@id": "config",
      "@context": {
        "@base": "<unique_iri>/team_name/config/"
      }
    },
    "connect": {
      "@id": "relation#connect",
      "@type": "@id",
      "@container": "@set"
    }
  },
  "@graph": [
    {
      "sysid": "vm_remote",
      "type": "vm",
      "name": "remote",
      "config": {
        "id": "worker_1",
        "cpu": "2"
      },
      "connect": [
        "db_users"
      ]
    },
    {
      "sysid": "db_users",
      "type": "db",
      "name": "users",
      "config": {
        "id": "database_1",
        "location": "eu_west"
      }
    }
  ],
  "@id": "graph",
  "@type": "graph"
}

\Expanded JSON-LD:
[
  {
    "@graph": [
      {
        "<unique_iri>/resources/onprem/config": [
          {
            "<unique_iri>/resources/onprem/cpu": [
              {
                "@value": "2"
              }
            ],
            "<unique_iri>/resources/onprem/id": [
              {
                "@value": "worker_1"
              }
            ]
          }
        ],
        "<unique_iri>/resources/onprem/relation#connect": [
          {
            "@id": "db_users"
          }
        ],
        "<unique_iri>/resources/onprem/name": [
          {
            "@value": "remote"
          }
        ],
        "@id": "vm_remote",
        "@type": [
          "<unique_iri>/resources/onprem/vm"
        ]
      },
      {
        "<unique_iri>/resources/onprem/config": [
          {
            "<unique_iri>/resources/onprem/id": [
              {
                "@value": "database_1"
              }
            ],
            "<unique_iri>/resources/onprem/location": [
              {
                "@value": "eu_west"
              }
            ]
          }
        ],
        "<unique_iri>/resources/onprem/name": [
          {
            "@value": "users"
          }
        ],
        "@id": "db_users",
        "@type": [
          "<unique_iri>/resources/onprem/db"
        ]
      }
    ],
    "@id": "graph",
    "@type": [
      "<unique_iri>/resources/onprem/graph"
    ]
  }
]

RDF Graph:
[
  {
    "@id": "file:///C:...",
    "@type": [
      "<unique_iri>/resources/onprem/graph"
    ]
  }
]

Type: <unique_iri>/resources/onprem/graph
  • The graph is missing the nodes and I don't understand what I am doing wrong.
  • I am also not sure how to deal with the config nodes. These should be nodes with their own unique identifier since other data sources will be pointing to these too.
  • Also, these python libraries give me other results than the online playground tool from json-ld.

Can somebody please help me?

Derk
  • 11
  • 2

0 Answers0