For a rdf-graph based project I have to do the following:
- parse an rdf graph from rdf and ttl files
- make subclusters on them and undergo network analysis
- comment upon how to improvise clustering techniques to improvise the semantic web results
Being relatively new to the entire field along with coding I am facing some issues.
First, I am able to parse the rdf file into a rdf graph using python library:
!pip install rdflib
from rdflib import Graph as RDFGraph
from rdflib.extras.external_graph_libs import rdflib_to_networkx_graph
# RDF graph loading
path = ("any file with rdf extension")
rg = RDFGraph()
rg.parse(path)
print("rdflib Graph loaded successfully with {} triples".format(len(rg)))
I see that the graph has more than 20000 statements, so I wanted to make a subgraph of it. But for that there is an issue - I read that we can use SPARQL for querying RDF. So, I did this:
qres = rg.query(
"""SELECT *
LIMIT 10.
""")
for row in qres:
print(row)
But it threw an error message:
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 7))
---------------------------------------------------------------------------
ParseException Traceback (most recent call last)
<ipython-input-13-66859aa59e83> in <module>()
2 """SELECT *
3 LIMIT 10.
----> 4 """)
5 for row in qres:
6 print(row)
4 frames
/usr/local/lib/python3.6/dist-packages/pyparsing.py in parseImpl(self, instring, loc, doActions)
2897 if instring[loc] == self.firstMatchChar:
2898 return loc + 1, self.match
-> 2899 raise ParseException(instring, loc, self.errmsg, self)
2900
2901 _L = Literal
ParseException: Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, found 'L' (at char 16), (line:2, col:8)
The rationale behind doing this was to know the entities and relations to have a subgraph as follows:
# Subgraph construction (optional)
entity = input("Entity type to build nodes of the subgraph with: ")
relation = input("Relation type to build edges of the subgraph with: ")
# TODO: Use entity and relation as parameters of a CONSTRUCT query
query = """
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
CONSTRUCT {{ ?u a {} . ?u {} ?v }} WHERE {{ ?u a {} . ?u {} ?v }}""".format(entity, relation, entity,
relation)
# print(query)
subg = rg.query(query)
rg = subg
Actually when I run the abouve piece of code, I enter Entity and relation, but without knowing anything about the contents of rdf I do not know how to do so.
Printing out 20000 lines of URIs will only kill time.
My subsequent goal is to convert it to a NetworkX graph and run graph clustering or graph analysis
Also, while the objective is clear, I am still trying to figure out best ways to do the task.
Since there are many out there, who are experts or may have experience working with Knowledge graphs or ML clustering on Knowledge graphs, can anyone please help me in this matter.
Also - here is a link to one of the rdf files I am to use: https://drive.google.com/file/d/1HSePLT61aqxkY1RARcNML04ms2Dydt9S/view?usp=sharing
But, just in case you can't open the file here is another link used in a tutorial:
https://raw.githubusercontent.com/albertmeronyo/lodapi/master/ghostbusters.ttl
Further, I thought that doing the following would work:
!pip install rdfpandas
import rdfpandas as pd
df = pd.to_DataFrame(rg)
df.head()
But this doesn't work too as it throws up an error:
AttributeError: module 'rdfpandas' has no attribute 'to_DataFrame'
Any help in this matter on how to go about, I will be grateful for it.