Basically I need to use the schema option from the perl module XML::libXML::Reader in order to validate a large (>1GB) XML file as the file is parsed.
Previously I have used the xmllint command to validate an XML file against a given schema (xsd) file. However now I have some large XML files to validate and am running out of memory (8GB) trying to perform the validation.
I have read on the XML::libXML::Reader perl module page that there is a schema option. However, when I use it (see code below) the code exits when the first invalidate element of the XML file is found.
use strict;
use warnings;
use XML::LibXML::Reader;
my $SchemaFile='schema.xsd';
my $FileToAnalyse='/tmp/file.xml';
my $reader = XML::LibXML::Reader->new(location => $FileToAnalyse,Schema=>$SchemaFile) or
die "cannot read file '$FileToAnalyse': $!\n";
while($reader->read) {
Process the file line by line here, even if not valid against schema (reduces memory usage for large files)
}
I need to collect the invalid entries and continue rather than exiting. Is this possible?