First call to openrdf-sesame update endpoint it very slow. Is there a way to optimize it?

Question

We have some python scripts to execute both sparql queries and "updates" (an insert/delete). Here is most of the relevant code (I think):

server = "localhost"
repo = "test"
query_endpoint  = "http://%s:8080/openrdf-sesame/repositories/%s" % (server,repo)
update_endpoint = "http://%s:8080/openrdf-sesame/repositories/%s/statements" % (server,repo)


def execute_query(query):
  params = { 'query': query }
  headers = {
    'content-type': 'application/x-www-form-urlencoded',
    'accept': 'application/sparql-results+json'
  }
  (response, content) = httplib2.Http().request(endpoint, 'POST', urllib.urlencode(params),headers=headers)
  return (response,ast.literal_eval(content))

def execute_update(query):
  params = { 'update': query }
  headers = {
    'content-type': 'application/x-www-form-urlencoded',
    'accept': 'application/sparql-results+json'
  }
  (response, content) = httplib2.Http().request(update_endpoint, 'POST', urllib.urlencode(params),headers=headers)
  return True

All of our calls to execute_query are very fast, less than 1 second to complete. However, any call to execute_update takes a really long time to (16 seconds) the first time. Every call after the first one runs in less than 1 second. We're running sesame version 2.7.12 (we thought upgrading from sesame version 2.7.3 might help, but it didn't much). We only have 2 or 3 thousand triples. This is all running from CGI scripts so we can't really just keep a python session alive to make update calls (anyway isn't that workbench's job?). Any ideas on what is taking so long on that first call to the update_endpoint? Are other people having the same issue? Any suggested resolutions?

Thanks!

EDIT I followed RobV's advice, but I'm still having the same problem. Log files from tshark:

 22.577578   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test HTTP/1.1 
 22.578261   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 22.583422   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 200 OK  (application/sparql-results+json)
 22.583857   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 22.591122   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test/statements HTTP/1.1 
 22.591388   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 35.020398   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 204 No Content 
 35.025605   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test/statements HTTP/1.1 
 35.025911   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 35.040606   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 204 No Content 
 35.045937   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test/statements HTTP/1.1 
 35.046080   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 35.049359   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 204 No Content 
 35.053776   10.10.2.43 -> 10.10.2.43   HTTP POST /openrdf-sesame/repositories/test/statements HTTP/1.1 
 35.053875   10.10.2.43 -> 10.10.2.43   HTTP Continuation or non-HTTP traffic
 35.056937   10.10.2.43 -> 10.10.2.43   HTTP HTTP/1.1 204 No Content

You can see the large gap on the first call to the /statements endpoint.

score 3 · Accepted Answer · answered Jul 02 '14 at 19:44

3

When we created the repository we created it as a "In Memory Store" repository. I created a new repository of the "Native Java Store" type, and now my first call is fast (as all are subsequent calls).

answered Jul 02 '14 at 19:44

Michael Richey

85
1
7

1

That would probably explain it then - the slow first call is caused by the in-memory store being initialized (reading its data from disk). The native store does not have this penalty (although it does have a similar penalty because it needs to warm up some caches - it should be nowhere near as heavy a penalty though). – Jeen Broekstra Jul 03 '14 at 06:48
Any ideas on why we don't get this start up penalty when we run a query against an in-memory store? We only saw it on insert/delete. – Michael Richey Jul 03 '14 at 13:19
Not sure. It would be the first operation after a restart of the server that gets this penalty, normally. Whether that is a query or update should not matter. – Jeen Broekstra Jul 03 '14 at 16:09

score 2 · Answer 2 · answered Jul 02 '14 at 14:54

The Sesame workbench and the server are two different applications running in separate application contexts within your web application container.

Your CGI code directs queries directly to the Sesame server but directs updates to the Sesame workbench.

Sesame workbench is actually just a UI for the Sesame server and essentially proxies your requests on to the underlying Sesame server. The first time you make an update the Workbench has to establish a connection to the server which I believe involves making various additional requests to the Sesame server for metadata. After this the connection is cached by the workbench which is why subsequent updates run very fast.

Updates can be directed against the Sesame server directly by changing your update endpoint to use the Sesame server /statements endpoint instead as detailed in the Sesame HTTP Protocol documentation e.g.

update_endpoint = "http://%s:8080/openrdf-sesame/repositories/%s/statements" % (server,repo)

By going directly against the Sesame server you should eliminate the long delay on the first update.

I just tried this. Changed it over to use openrdf-sesame application and the /statements endpoint, but the first call to it took 13 seconds. Additional calls were all less than 1 second. — Michael Richey, Jul 02 '14 at 15:09
@MichaelRichey I have edited your question to show the correct update endpoint URL instead of the Workbench one. The reason I did this is that I want to avoid other users finding this code and re-using it with the incorrect URL. — Jeen Broekstra, Jul 05 '14 at 09:55

First call to openrdf-sesame update endpoint it very slow. Is there a way to optimize it?

2 Answers2