I'm trying to reproduce the BioGrakn example from the White Paper "Text Mined Knowledge Graphs" with the aim of building a text mined knowledge graph out of my (non-biomedical) document collection later on. Therefore, I buildt a Maven project out of the classes and the data from the textmining use case in the biograkn repo. My pom.xml looks like that:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>TextMining-BioGrakn</groupId>
<artifactId>TextMining-BioGrakn</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>TextMining-BioGrakn</name>
<repositories>
<repository>
<id>repo.grakn.ai</id>
<url>https://repo.grakn.ai/repository/maven/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>io.grakn.client</groupId>
<artifactId>api</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>io.grakn.core</groupId>
<artifactId>concept</artifactId>
<version>1.5.3</version>
</dependency>
<dependency>
<groupId>io.graql</groupId>
<artifactId>lang</artifactId>
<version>1.0.1</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.9.2</version>
</dependency>
<dependency>
<groupId>edu.stanford.nlp</groupId>
<artifactId>stanford-corenlp</artifactId>
<version>3.9.2</version>
<classifier>models</classifier>
</dependency>
</dependencies>
</project>
Migrating the schema, inserting the pubmed articles and training the model works perfectly, but then I got an java.lang.OutOfMemoryError: GC overhead limit exceeded
, which is thrown in the mineText()
method in the CoreNLP class. This is how the main method in the Migrator class looks like:
public class Migrator {
public static void main(String[] args) {
GraknClient graknClient = new GraknClient("localhost:48555");
GraknClient.Session session = graknClient.session("text_mining");
try {
loadSchema("schema/text-mining-schema.gql", session);
PubmedArticle.migrate(session);
CoreNLP.migrate(session);
} catch (Exception e) {
e.printStackTrace();
session.close();
}
session.close();
graknClient.close();
}
}
Do you have any idea on what could cause this error? Am I missing something fundamental here? Any help is highly appreciated.