I am trying to load several large biomedical ontologies into a GraphDB Owl Horst optimized repository, along with 10s of millions of triples using terms from those ontologies. I can load these data into a RDFS+ optimized repo in less than 1 hour, but I can't even load one of the ontologies (chebi_lite) if I let it go overnight. That's using loadrdf on a 64 core, 256 GB AWS server.
My earlier question Can GraphDB load 10 million statements with OWL reasoning? lead to the suggestion that I use the preload
command, followed by re-inferring. The preload indeed went very quickly, but when it comes to re-inferring, only one core is used. I haven't been able to let it go for more than an hour yet. Is re-inferring using just one core a consequence of using the free version? Might loadrdf
work better if I just did a better configuration?
When I use loadrdf
, all cores go close to 100%, but the memory usage never goes over 10% or so. I have tinkered with the JVM memory settings a little, but I don't really know what I'm doing. For example
-Xmx80g -Dpool.buffer.size=2000000