I know that the JSON streaming parser https://github.com/salsify/jsonstreamingparser has already been mentioned. But as I have recently(ish) added a new listener to it to try and make it easier to use out of the box I thought I would (for a change) put some information out about what it does...
There is a very good write up about the basic parser at https://www.salsify.com/blog/engineering/json-streaming-parser-for-php, but the issue I have with the standard setup was that you always had to write a listener to process a file. This is not always a simple task and can also take a certain amount of maintenance if/when the JSON changed. So I wrote the RegexListener
.
The basic principle is to allow you to say what elements you are interested in (via a regex expression) and give it a callback to say what to do when it finds the data. Whilst reading the JSON, it keeps track of the path to each component - similar to a directory structure. So /name/forename
or for arrays /items/item/2/partid
- this is what the regex matches against.
An example is (from the source on github)...
$filename = __DIR__.'/../tests/data/example.json';
$listener = new RegexListener([
'/1/name' => function ($data): void {
echo PHP_EOL."Extract the second 'name' element...".PHP_EOL;
echo '/1/name='.print_r($data, true).PHP_EOL;
},
'(/\d*)' => function ($data, $path): void {
echo PHP_EOL."Extract each base element and print 'name'...".PHP_EOL;
echo $path.'='.$data['name'].PHP_EOL;
},
'(/.*/nested array)' => function ($data, $path): void {
echo PHP_EOL."Extract 'nested array' element...".PHP_EOL;
echo $path.'='.print_r($data, true).PHP_EOL;
},
]);
$parser = new Parser(fopen($filename, 'r'), $listener);
$parser->parse();
Just a couple of explanations...
'/1/name' => function ($data)
So the /1
is the the second element in an array (0 based), so this allows accessing particular instances of elements. /name
is the name
element. The value is then passed to the closure as $data
"(/\d*)" => function ($data, $path )
This will select each element of an array and pass it one at a time, as it's using a capture group, this information will be passed as $path
. This means when a set of records is present in a file, you can process each item one at a time. And also know which element without having to keep track.
The last one
'(/.*/nested array)' => function ($data, $path):
effectively scans for any elements called nested array
and passes each one along with where it is in the document.
Another useful feature I found was that if in a large JSON file, you just wanted the summary details at the top, you can grab those bits and then just stop...
$filename = __DIR__.'/../tests/data/ratherBig.json';
$listener = new RegexListener();
$parser = new Parser(fopen($filename, 'rb'), $listener);
$listener->setMatch(["/total_rows" => function ($data ) use ($parser) {
echo "/total_rows=".$data.PHP_EOL;
$parser->stop();
}]);
This saves time when you are not interested in the remaining content.
One thing to note is that these will react to the content, so that each one is triggered when the end of the matching content is found and may be in various orders. But also that the parser only keeps track of the content you are interested in and discards anything else.
If you find any interesting features (sometimes horribly know as bugs), please let me know or report an issue on the github page.