I'm looking for a recipe for converting Pandas DataFrames to RDF data in Python. I'm aware of the following Python modules (I know how to Google!), but they do not work for me: rdfpandas pandasrdf Neither seems mature. I have problems with both. In the case of rdfpandas, I'm unable to install and there are no examples and insufficient documentation. In the case of pandasrdf, the example doesn't work and crashes. I can fix it, but the RDF file has zero triples, so the result is useless. I'd rather not have to write out the data to some intermediate data file that I have to injest later. Pandas->numpy->RDF would be OK I guess. Does anybody have a working example of converting a Pandas DataFrame to RDF in one of the common serialisation formats that does not involve an artisanal black magic package installation?
Asked
Active
Viewed 2,854 times
3
-
Given that RDF is made of triples `(subject, predicate, object)`, can't you just convert your dataframe to triples? I mean, I don't know how your dataframe looks like, but you'll need a mapping anyways or not? you have some columns, each of which represents a property I guess (this will be your predicate)? So you need an identifier aka URI that denotes the entity represented by each row. This will become your subject. All you need is to create triples for each row `i` and column `j`, formally something like `(s_i, c_j, val_ij)` - for N-Triples this is trivial to convert – UninformedUser Jan 21 '19 at 20:37
-
1And `rdfpandas` really doesn't work? It's still active project and there is at least a small example how to use it? What else do you need? You can also have a look into the small test class: https://github.com/cadmiumkitty/rdfpandas/blob/master/tests/test_data_frame_to_graph.py – UninformedUser Jan 21 '19 at 20:40
1 Answers
2
A newer version of RdfPandas is out, so you can try it out and see if it covers your use case: https://rdfpandas.readthedocs.io/en/latest (thanks to Carmoreno for the prompt to fix the link)
Example based on https://github.com/cadmiumkitty/capability-models/blob/master/notebooks/investment_management_capabilities.csv is below
import pandas as pd
import rdfpandas
df = pd.read_csv('investment_management_capabilities.csv', index_col = '@id', keep_default_na = True)
g = rdfpandas.to_graph(df)
ttl = g.serialize(format = 'turtle')
with open('investment_management_capabilities.ttl', 'wb') as file:
file.write(ttl)
The code that does the conversion is pretty minimal and is here (just look at the to_graph method) https://github.com/cadmiumkitty/rdfpandas/blob/master/rdfpandas/graph.py, so you can use it directly as an inspiration to create your own conversion logic.

Eugene Morozov
- 36
- 5
-
The documentation link provided in the answer is broken. You can access it on https://rdfpandas.readthedocs.io/en/latest/ – Carmoreno Jul 28 '22 at 10:21