0

I am trying to upload some data to Dydra from a Sesame triplestore I have on my computer. While the download from Sesame works fine, the triples get mixed up (the s-p-o relationships change as the object of one becomes object of another). Can someone please explain why this is happening and how it can be resolved? The code is below:

#Querying the triplestore to retrieve all results
sesameSparqlEndpoint = 'http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name'
sparql = SPARQLWrapper(sesameSparqlEndpoint)
queryStringDownload = 'SELECT * WHERE {?s ?p ?o}'
dataGraph = Graph()

sparql.setQuery(queryStringDownload)
sparql.method = 'GET'
sparql.setReturnFormat(JSON)
output = sparql.query().convert()
print output

for i in range(len(output['results']['bindings'])):
    #The encoding is necessary to parse non-English characters
    output['results']['bindings'][i]['s']['value'].encode('utf-8')
    try:
        subject_extract = output['results']['bindings'][i]['s']['value']
        if 'http' in subject_extract:
            subject = "<" + subject_extract + ">"
            subject_url = URIRef(subject)
            print subject_url

        predicate_extract = output['results']['bindings'][i]['p']['value']
        if 'http' in predicate_extract:
            predicate = "<" + predicate_extract + ">"
            predicate_url = URIRef(predicate)
            print predicate_url

        objec_extract = output['results']['bindings'][i]['o']['value']
        if 'http' in objec_extract:
            objec = "<" + objec_extract + ">"
            objec_url = URIRef(objec)
            print objec_url
        else:
            objec = objec_extract
            objec_wip = '"' + objec + '"'
            objec_url = URIRef(objec_wip)

        # Loading the data on a graph       
        dataGraph.add((subject_url,predicate_url,objec_url))

    except UnicodeError as error: 
        print error

#Print all statements in dataGraph      
for stmt in dataGraph:
    pprint.pprint(stmt)

# Upload to Dydra
URL = 'http://dydra.com/login'
key = 'my_key'

with requests.Session() as s:
    resp = s.get(URL)
    soup = BeautifulSoup(resp.text,"html5lib")
    csrfToken = soup.find('meta',{'name':'csrf-token'}).get('content')
    # print csrf_token
    payload = {
    'account[login]':key,
    'account[password]':'',
    'csrfmiddlewaretoken':csrfToken,
    'next':'/'
    }
    # print payload

    p = s.post(URL,data=payload, headers=dict(Referer=URL))
    # print p.text

    r = s.get('http://dydra.com/username/rep_name/sparql')
    # print r.text

    dydraSparqlEndpoint = 'http://dydra.com/username/rep_name/sparql'
    for stmt in dataGraph:
        queryStringUpload = 'INSERT DATA {%s %s %s}' % stmt
        sparql = SPARQLWrapper(dydraSparqlEndpoint)
        sparql.setCredentials(key,key)
        sparql.setQuery(queryStringUpload)
        sparql.method = 'POST'
        sparql.query()
kurious
  • 1,024
  • 10
  • 29
  • Wow. You are taking the long way around here. Why you are using a SELECT-query to extract all triples (and jumping through all kinds of hoops to reconstruct the actual RDF triples from the query result), rather than using a CONSTRUCT query (which gives you the result ready-made as RDF statements)? – Jeen Broekstra Dec 22 '15 at 20:05
  • 1
    Well, this is embarrassing; I should have thought of this. Just doing this appears to do the trick: sesameSparqlEndpoint = 'http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name' sparql = SPARQLWrapper(sesameSparqlEndpoint) queryStringDownload = 'CONSTRUCT {?s ?p ?o} WHERE {?s ?p ?o}' dataGraph = Graph() – kurious Dec 22 '15 at 20:22
  • I'm having issues iterating over the CONSTRUCT query output. The follow-up question is at http://stackoverflow.com/questions/34425876/how-to-iterate-over-construct-output-from-rdflib. – kurious Dec 22 '15 at 23:04

2 Answers2

1

A far simpler way to copy your data over (apart from using a CONSTRUCT query instead of a SELECT, like I mentioned in the comment) is simply to have Dydra itself directly access your Sesame endpoint, for example via a SERVICE-clause.

Execute the following on your Dydra database, and (after some time, depending on how large your Sesame database is), everything will be copied over:

   INSERT { ?s ?p ?o }
   WHERE { 
      SERVICE <http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name> 
      { ?s ?p ?o }
   }

If the above doesn't work on Dydra, you can alternatively just directly access the RDF statements from your Sesame store by using the URI http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements. Assuming Dydra has an upload-feature where you can provide the URL of an RDF document, you can simply provide it the above URI and it should be able to load it.

Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73
  • It seems one needs permission from Dydra to use the SERVICE clause. Otherwise, this is probably the most painless way to do port the data. – kurious Dec 22 '15 at 23:04
0

The code above can work if the following changes are made:

  1. Use CONSTRUCT query instead of SELECT. Details here -> How to iterate over CONSTRUCT output from rdflib?
  2. Use key as input for both account[login] and account[password]

However, this is probably not the most efficient way. Primarily, doing individual INSERTs for every triple is not a good way. Dydra doesn't record all statements this way (I got only about 30% of the triples inserted). On the contrary, using the http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements method as suggested by Jeen enabled me to port all the data successfully.

Community
  • 1
  • 1
kurious
  • 1,024
  • 10
  • 29