How to parse a big rdf file in rdf4j

Question

I want to parse a huge file in RDF4J using the following code but I get an exception due to parser limit;

public class ConvertOntology {

    public static void main(String[] args) throws RDFParseException, RDFHandlerException, IOException {

        String file =  "swetodblp_april_2008.rdf";
        File initialFile = new File(file);
        InputStream input = new FileInputStream(initialFile);
        RDFParser parser = Rio.createParser(RDFFormat.RDFXML);
        parser.setPreserveBNodeIDs(true); 
        Model model = new LinkedHashModel();
        parser.setRDFHandler(new StatementCollector(model));
        parser.parse(input, initialFile.getAbsolutePath());
        FileOutputStream out = new FileOutputStream("swetodblp_april_2008.nt");
            RDFWriter writer = Rio.createWriter(RDFFormat.TURTLE, out);
        try {
          writer.startRDF();
          for (Statement st: model) {
                    writer.handleStatement(st);
          }
          writer.endRDF();
        }
        catch (RDFHandlerException e) {
        }
        finally {
          out.close();
        }

    }

The parser has encountered more than "100,000" entity expansions in this document; this is the limit imposed by the application.

I execute my code as following as suggested on the RDF4J web site to set up the two parameters (as in the following command)

mvn -Djdk.xml.totalEntitySizeLimit=0 -DentityExpansionLimit=0 exec:java

any help please

`-DentityExpansionLimit` is the legacy system property, the new one is `-Djdk.xml.entityExpansionLimit` - but it should still work. I'm not very familiar with the maven exec plugin - are you sure it passes these properties along to the java process? — Jeen Broekstra, Jan 25 '20 at 00:26
@JeenBroekstra with your suggested property i get the following exception Unknown lifecycle phase "Djdk.xml.entityExpansionLimit=0". You must specify a valid lifecycle phase or a goal in the format : or :[:]:. — bib, Jan 25 '20 at 01:32
That looks to me like you made a typo, perhaps forgetting a `-` in front or something. — Jeen Broekstra, Jan 25 '20 at 04:23
still i have the same problem with mvn -Djdk.xml.totalEntitySizeLimit=0 -Djdk.xml.entityExpansionLimit exec:java. InvocationTargetException: The parser has encountered more than "100,000" entity expansions in this document; this is the limit imposed by the application. — bib, Feb 01 '20 at 00:25
This was also asked (and answered) at https://github.com/eclipse/rdf4j/issues/1875 — Jeen Broekstra, Feb 02 '20 at 00:10
@JeenBroekstra sorry but it is not solving in the provided link — bib, Feb 02 '20 at 00:20

score 0 · Accepted Answer · answered Feb 02 '20 at 05:09

0

The error is due to the Apache Xerces XML parser, rather than the default JDK XML parser. So Just delete Xerces XML folder from you .m2 repository and the code works fine.

answered Feb 02 '20 at 05:09

bib

944
3
15
32

How to parse a big rdf file in rdf4j

1 Answers1