3

I have a large RDF dataset (Geonames dataset: 18GB) in NT format. I would like to load it into a PostgreSQL relational table by using rdflib_sqlalchemy.SQLAlchemy. I know that it is doable (performing sparql query on the rdf data stored in relational database). However, I am not sure how. Could you please provide me an example?

My next goal is to write an SPARQL query from python by using RDFLib. I know how to do it. Thanks in advance for your help.

Community
  • 1
  • 1
Beautiful Mind
  • 5,828
  • 4
  • 23
  • 42
  • 1
    Loading the data should probably be done via the RDFLib API - I mean this is just a subproject that stores the triples into a different backend. That means, reading the RDFLib docs should be a good starting point. – UninformedUser Jan 06 '17 at 21:30

1 Answers1

6

Install these Python libraries:

pip install rdflib
pip install rdflib-sqlalchemy
pip install psycopg2

Run the following Python code:

from rdflib import plugin
from rdflib.graph import Graph
from rdflib.store import Store
from rdflib_sqlalchemy import registerplugins

registerplugins()

SQLALCHEMY_URL ="postgresql+psycopg2://user:password@hostname:port/databasename"

store = plugin.get("SQLAlchemy", Store)(identifier="my_store")
graph = Graph(store, identifier="my_graph")
graph.open(SQLALCHEMY_URL, create=True)

graph.parse("demo.nt", format="nt")

result = graph.query("select * where {?s ?p ?o} limit 10")

for subject, predicate, object_ in result:
    print(subject, predicate, object_)

graph.close()

'demo.nt' is the N-Triples file to import. I used this for testing:

<http://example.org/a> <http://example.org/b> <http://example.org/c> .

After being imported successfully, your database contains five tables (e.g., kb_[some_id]_asserted_statements) populated with the triples. The console has printed ten triples at most.

Tested on Windows 10, PostgreSQL 10.5, Python 3.5.4 (all 64bit) with rdflib-4.2.2, rdflib-sqlalchemy-0.3.8, and psycopg2-2.7.5.

F1refly
  • 336
  • 2
  • 7
  • I appreciate the complete example. Perhaps note that set `create` to `False` on re-opening the database. `graph.open(SQLALCHEMY_URL, create=False)` – Douglas M Oct 31 '22 at 18:23