0

I'm new to using Perl XML::SAX and I encountered a problem with the characters event that is triggered. I'm trying to parse a very large XML file using perl.

My goal is to get the content of each tag (I do not know the tag names - given any xml file, I should be able to crack the record pattern and return every record with its data and tag like Tag:Data).

While working with small files, everything is ok. But when running on a large file, the characters{} event does partial reading of the content. There is no specific pattern in the way it cuts down the reading. Sometimes its the starting few characters of data and sometimes its last few characters and sometimes its just one letter from the actual data.

The Sax Parser is:

$myhandler = MyFilter->new();
$parser = XML::SAX::ParserFactory->parser(Handler => $myhandler);
$parser->parse_file($filename);

And, I have written my own Handler called MyFilter and overridding the character method of the parser.

sub characters {
my ($self, $element) = @_;  
$globalvar = $element->{Data}; 
print "content is: $globalvar \n";  
} 

Even this print statement, reads the values partially at times. I also tried loading the Parsesr Package before calling the $parser->parse() as:

$XML::SAX::ParserPackage = "XML::SAX::ExpatXS";

Stil doesn't work. Could anyone help me out here? Thanks in advance!

GPN
  • 1

1 Answers1

0

Sounds like you need XML::Filter::BufferText.

http://search.cpan.org/dist/XML-Filter-BufferText/BufferText.pm

From the description "One common cause of grief (and programmer error) is that XML parsers aren't required to provide character events in one chunk. They can, but are not forced to, and most don't. This filter does the trivial but oft-repeated task of putting all characters into a single event."

It's very easy to use once you have it installed and will solve your partial character data problem.

rdw
  • 1
  • 1