0

I'm parsing some XML using Python's Expat (by calling parser = xml.parsers.expat.ParserCreate() and then setting the relevant callbacks to my methods).

It seems that when Expat calls read(nbytes) to return new data, nbytes is always 2,048. I have quite a lot of XML to process, and suspect that these small read()s are making the overall process rather slow. As a point of reference, I'm seeing throughput around 9 MB/s on an Intel Xeon X5550, 2.67 GHz running Windows 7.

I've tried setting parser.buffer_text = True and parser.buffer_size = 65536, but Expat is still calling the read() method with an argument of just 2,048.

Is it possible to increase this?

skaffman
  • 398,947
  • 96
  • 818
  • 769
unwind
  • 391,730
  • 64
  • 469
  • 606

1 Answers1

2

You're talking about the xmlparse.ParseFile method, right?

Unfortunately, no, that value is hardcoded as BUF_SIZE = 2048 in pyexpat.c.

Cito
  • 5,365
  • 28
  • 30