13

I have a very large array in PHP (5.6), generated dynamically, which I want to convert to JSON. The problem is that the array is too large that it doesn't fit in memory - I get a fatal error when I try to process it (exhausted memory). So I figured out that, using generators, the memory problem will disappear.

This is the code I've tried so far (this reduced example obvisously doesn't produce the memory error):

<?php 
function arrayGenerator()// new way using generators
{
    for ($i = 0; $i < 100; $i++) {
        yield $i;
    }
}

function getArray()// old way, generating and returning the full array
{
    $array = [];
    for ($i = 0; $i < 100; $i++) {
        $array[] = $i;
    }
    return $array;
}

$object = [
    'id' => 'foo',
    'type' => 'blah',
    'data' => getArray(),
    'gen'  => arrayGenerator(),
];

echo json_encode($object);

But PHP seems to not JSON-encode the values from the generator. This is the output I get from the previuos script:

{
    "id": "foo",
    "type": "blah",
    "data": [// old way - OK
        0,
        1,
        2,
        3,
        //...
    ],
    "gen": {}// using generator - empty object!
}

Is it possible to JSON-encode an array produced by a generator without generating the full sequence before I call to json_encode?

Iván Pérez
  • 2,278
  • 1
  • 24
  • 49
  • 4
    The only way to encode the entire sequence is to generate the entire sequence. In the background that will need to happen. If you want to make the generator a usable array you could use `iterator_to_array(arrayGenerator())` – apokryfos Apr 06 '16 at 09:13
  • Using that function I get the same problem again - memory got exhausted. The only thing I could do at the moment is to split the array or increase the memory limit (not the solution I was looking for...). – Iván Pérez Apr 06 '16 at 09:22
  • 2
    I'm afraid your problem cannot be solved any other way unless you create your own streaming JSON encoder which is probably going to offer less benefit than the time it will take to make it work. – apokryfos Apr 06 '16 at 09:26
  • 1
    The only way to really generate JSON data which doesn't fit into memory is to *stream* it. For this you'll a) need a streaming JSON generator (which PHP doesn't have built in) and b) stream the result somewhere immediately, e.g. to stdout, or to a file, or to a web server from where it is downloaded. Concatenating the result into a string in memory and storing it in a variable will have the same memory issue. – deceze Apr 06 '16 at 09:32
  • @deceze, apokryfos: Thanks four your suggestions. I've found some libraries to create a JSON stream (https://github.com/rayward/json-stream), it looks really promising. I'll try it. – Iván Pérez Apr 06 '16 at 09:52
  • 1
    Actually, this maybe does what you want: [Streaming parser for JSON collections](https://github.com/MAXakaWIZARD/JsonCollectionParser). – Ryan Vincent Apr 06 '16 at 11:06

2 Answers2

8

Unfortunately, json_encode cannot generate a result from a generator function. Using iterator_to_array will still try to create the whole array, which will still cause memory issues.

You will need to create your function that will generate the json string from the generator function. Here's an example of how that could look:

function json_encode_generator(callable $generator) {
    $result = '[';

    foreach ($generator as $value) {
        $result .= json_encode($value) . ',';
    }

    return trim($result, ',') . ']';
}

Instead of encoding the whole array at once, it encodes only one object at a time and concatenates the results into one string.

The above example only takes care of encoding an array, but it can be easily extended to recursively encoding whole objects.

If the created string is still too big to fit in the memory, then your only remaining option is to directly use an output stream. Here's how that could look:

function json_encode_generator(callable $generator, $outputStream) {
    fwrite($outputStream, '[');

    foreach ($generator as $key => $value) {
        if ($key != 0) {
            fwrite($outputStream, ','); 
        }

        fwrite($outputStream, json_encode($value));
    }

    fwrite($outputStream, ']');
}

As you can see, the only difference is that we now use fwrite to write to the passed in stream instead of concatenating strings, and we also need to take care of the trailing comma in a different way.

Kuba Birecki
  • 2,926
  • 1
  • 13
  • 16
  • 2
    Of course this still generates an enormous amount of JSON *in memory*, which might be even larger than the original data... – deceze Apr 06 '16 at 09:30
  • 2
    Well, strings are more memory efficient than arrays in PHP, so it might be that the above solution is sufficient. Otherwise, you would have to use an output stream directly instead of temporarily storing it inside a string. Whether it's a string or a stream, the logic remains the same. – Kuba Birecki Apr 06 '16 at 09:34
  • Btw. composing the resulting string via `$result .= ...` will need huge amounts of memory _(instead of, say, appending it to a array list and then imploding it)_, because for each iteration it will create a completely new (and increasingly longer) string. – Smuuf Nov 16 '22 at 11:05
2

What is a generator function?

A generator function is effectively a more compact and efficient way to write an Iterator. It allows you to define a function that will calculate and return values while you are looping over it:

Also as per document from http://php.net/manual/en/language.generators.overview.php

Generators provide an easy way to implement simple iterators without the overhead or complexity of implementing a class that implements the Iterator interface.

A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.

What is yield?

The yield keyword returns data from a generator function:

The heart of a generator function is the yield keyword. In its simplest form, a yield statement looks much like a return statement, except that instead of stopping execution of the function and returning, yield instead provides a value to the code looping over the generator and pauses execution of the generator function.

So in your case to generate expected output you need to iterate output of arrayGenerator() function by using foreach loop or iterator before processind it to json (as suggested by @apokryfos)

Community
  • 1
  • 1
Chetan Ameta
  • 7,696
  • 3
  • 29
  • 44
  • 1
    while searching for memory issue in array i found http://php.net/manual/en/class.splfixedarray.php, http://www.php.net/manual/en/intro.judy.php and http://stackoverflow.com/questions/21333474/need-an-array-like-structure-in-php-with-minimal-memory-usage hope this will help you? – Chetan Ameta Apr 06 '16 at 09:38