2

This question is analogous to HTML::PullParser splits up text element randomly. Basically I'm running XML::Parser and when it gets strings back, it breaks them into multiple pieces (to speed things up, I suppose). But what can I do to prevent this behavior? I can't seem to find anything in the doc for that module or for the XML::Parser::Expat.

Community
  • 1
  • 1
Zhang18
  • 4,800
  • 10
  • 50
  • 67

2 Answers2

1

I don't know this parser in particular, but it's a common feature of streaming parsers that the spec allows them to split text nodes wherever they like. In many cases they take advantage of this to split the text at entity boundaries (avoiding a string copy operation), but they can also do it at I/O buffer boundaries, for example. You either have to live with it, assembling the text yourself at application level, or use a higher-level interface for XML processing, such as XSLT or XQuery.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Yeah, technically I can do that at application level. But for some (long, hard to explain) reason that will not be ideal. Still looking for a switch. Should there be one, right? (Especially given HTML::Parser has one.) – Zhang18 Aug 26 '11 at 17:39
1
When you get text,

- Append the text to a buffer.

When you get something other than text,

- If the buffer contains text,
  - Process the text in the buffer.
  - Empty the buffer.

- Process what you just got.
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • See my comment to the other answer - this is in principle easy to do, but my specific app has a quirky feature that prevents me from doing that (can't elaborate due to proprietary reason). The real purpose of this question is to see if `XML::Parser` has a **switch** just as `HTML::PullParser` has one. If someone knows definitively that it doesn't have that feature, then let me know please. – Zhang18 Aug 29 '11 at 14:30