1

For a rdf-graph based project I have to do the following:

  1. parse an rdf graph from rdf and ttl files
  2. make subclusters on them and undergo network analysis
  3. comment upon how to improvise clustering techniques to improvise the semantic web results

Being relatively new to the entire field along with coding I am facing some issues.

First, I am able to parse the rdf file into a rdf graph using python library:

 !pip install rdflib    
 from rdflib import Graph as RDFGraph
 from rdflib.extras.external_graph_libs import rdflib_to_networkx_graph
 # RDF graph loading
 path = ("any file with rdf extension")
 rg = RDFGraph()
 rg.parse(path)
 print("rdflib Graph loaded successfully with {} triples".format(len(rg)))
 

I see that the graph has more than 20000 statements, so I wanted to make a subgraph of it. But for that there is an issue - I read that we can use SPARQL for querying RDF. So, I did this:

qres = rg.query(
"""SELECT *
   LIMIT 10.
   """)
for row in qres:
   print(row)

But it threw an error message:

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 7))

---------------------------------------------------------------------------
ParseException                            Traceback (most recent call last)
<ipython-input-13-66859aa59e83> in <module>()
      2     """SELECT *
      3        LIMIT 10.
----> 4        """)
      5 for row in qres:
      6     print(row)

4 frames
/usr/local/lib/python3.6/dist-packages/pyparsing.py in parseImpl(self, instring, loc, doActions)
   2897         if instring[loc] == self.firstMatchChar:
   2898             return loc + 1, self.match
-> 2899         raise ParseException(instring, loc, self.errmsg, self)
   2900 
   2901 _L = Literal

ParseException: Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery}, found 'L'  (at char 16), (line:2, col:8)

The rationale behind doing this was to know the entities and relations to have a subgraph as follows:

# Subgraph construction (optional)
entity = input("Entity type to build nodes of the subgraph with: ")
relation = input("Relation type to build edges of the subgraph with: ")

# TODO: Use entity and relation as parameters of a CONSTRUCT query
query = """
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>
CONSTRUCT {{ ?u a {} . ?u {} ?v }} WHERE {{ ?u a {} . ?u {} ?v }}""".format(entity, relation, entity, 
relation)
# print(query)
subg = rg.query(query)

rg = subg

Actually when I run the abouve piece of code, I enter Entity and relation, but without knowing anything about the contents of rdf I do not know how to do so.

Printing out 20000 lines of URIs will only kill time.

My subsequent goal is to convert it to a NetworkX graph and run graph clustering or graph analysis

Also, while the objective is clear, I am still trying to figure out best ways to do the task.

Since there are many out there, who are experts or may have experience working with Knowledge graphs or ML clustering on Knowledge graphs, can anyone please help me in this matter.

Also - here is a link to one of the rdf files I am to use: https://drive.google.com/file/d/1HSePLT61aqxkY1RARcNML04ms2Dydt9S/view?usp=sharing

But, just in case you can't open the file here is another link used in a tutorial:

https://raw.githubusercontent.com/albertmeronyo/lodapi/master/ghostbusters.ttl

Further, I thought that doing the following would work:

!pip install rdfpandas
import rdfpandas as pd

df = pd.to_DataFrame(rg)
df.head()

But this doesn't work too as it throws up an error:

AttributeError: module 'rdfpandas' has no attribute 'to_DataFrame'

Any help in this matter on how to go about, I will be grateful for it.

K C
  • 413
  • 4
  • 15
  • well, your query is just wrong: `SELECT * LIMIT 10.` is invalid syntax - SPARQL is about pattern matching, you have to add triple patterns to it, like `SELECT * WHERE {?s ?p ?o} LIMIT 10` - and even, then you will get bindings instead of a subgraph with triples. You have to use a `CONSTRUCT` queries which returns an RDF graph as result – UninformedUser Dec 09 '20 at 11:41
  • Thank you for pointing that. I am indeed able to print the rdf triples now using your code. However, being very new to this subject, I am still looking out for ideas and ways on how to meet the objectives of the study – K C Dec 10 '20 at 12:47
  • Also - can you please tell me how you inserted code within your comments? – K C Dec 10 '20 at 12:47

0 Answers0