0

I'm using RapidJSON to parse large GeoJSON files. Most of the content in these files (and hence memory after parsing) is giant coordinate arrays. For my application, I'm not interested in these. I'd prefer to skip them (not allocate memory for them) while parsing. Based on some testing using the SAX API, I expect this to roughly double the speed of parsing.

My initial thought was to write a custom Handler. I'd have to build up my own Value object using my own stack, however, work which would duplicate what's done by the GenericDocument class.

My next thought was to subclass GenericDocument. Its ParseStream and Handler methods aren't virtual, however, so I can't get it to use my own. I could implement my own ParseStream, but the stack_ field is private, so even a subclass can't access it.

Is a custom Handler with its own Stack the right way to go here? Has anyone done something like this before?

danvk
  • 15,863
  • 5
  • 72
  • 116
  • FWIW, here's a [version](https://gist.github.com/danvk/90beec9f0387788eca9c1ba8402dbee1) that subclasses `GenericValue` with its own stack and Handler implementation. It works and is fast but leaves a lot to be desired in terms of code duplication. I'd love to get some feedback on whether there's a better approach. – danvk Apr 21 '16 at 17:08

1 Answers1

2

During implementing the JSON schema validator feature in RapidJSON, I added a new API for Document::Populate(generator) which uses a generator to fill contents of the document. It should be suitable for this.

Currently it was shown in here:

// Parse JSON from reader, validate the SAX events, and store in d.
Document d;
SchemaValidatingReader<kParseDefaultFlags, FileReadStream, UTF8<> > reader(is, schema);
d.Populate(reader);

d.Populate(generator) will call generator(d), and then generator generates SAX events and send to d.

Therefore, it should be possible to write a custom SAX handler, which filters some SAX events, and forward the events to document.

An example may be even better. You may drop an issue.


Update: Two examples have been added.

  • filterkey: A command line tool to remove all values with user-specified key.
  • filterkeydom: Same tool as above, but it demonstrates how to use a generator to populate a Document.
Milo Yip
  • 4,902
  • 2
  • 25
  • 27
  • Thanks for the suggestion. I filed [an issue](https://github.com/miloyip/rapidjson/issues/613). – danvk Apr 22 '16 at 15:24