1

I am trying to load several large biomedical ontologies into a GraphDB Owl Horst optimized repository, along with 10s of millions of triples using terms from those ontologies. I can load these data into a RDFS+ optimized repo in less than 1 hour, but I can't even load one of the ontologies (chebi_lite) if I let it go overnight. That's using loadrdf on a 64 core, 256 GB AWS server.

My earlier question Can GraphDB load 10 million statements with OWL reasoning? lead to the suggestion that I use the preload command, followed by re-inferring. The preload indeed went very quickly, but when it comes to re-inferring, only one core is used. I haven't been able to let it go for more than an hour yet. Is re-inferring using just one core a consequence of using the free version? Might loadrdf work better if I just did a better configuration?

When I use loadrdf, all cores go close to 100%, but the memory usage never goes over 10% or so. I have tinkered with the JVM memory settings a little, but I don't really know what I'm doing. For example

-Xmx80g -Dpool.buffer.size=2000000 
Mark Miller
  • 3,011
  • 1
  • 14
  • 34

0 Answers0