0

I use curl to get a big chunk of data from a web service. I find myself exhausting the memory limit when I use json_decode() on that data. I know I could increase the limit, but that is not a good solution since the data keeps increasing.

The real problem is that I only need a small portion of the json that I am fetching. So, to simplify things a bit, my data looks something like this:

{ array [
  {           // object 1
  "field1": "xxx",
  "field2": "yyy",
  .
  .
  "field30": "zzz"
  },
  .
  .
  .           // object 15,000
]
}

Right now there are about 15,000 objects in array[] and each has 30 fields. I expect the number of objects to grow to around 50,000 in coming months.

Since I need all the objects but only fields 1 and 6, I am wondering if I can somehow change the above to something more like this:

{ array [
  {             // object 1
  "field1": "xxx",
  "field6": "aaa"
  },
  .
  .
  .            // object 15,000
]
}

I imagine that would reduce the memory usage substantially. Any ideas?

Andri
  • 453
  • 4
  • 22
  • Can't you limit the result set from the web service? – JimL Mar 02 '17 at 20:16
  • No, I am afraind that is not possible. – Andri Mar 02 '17 at 20:21
  • can you filter the number of rows of data? You could try to get it in small chunks. – F.Igor Mar 02 '17 at 20:40
  • Take a look at following library (never used it myself): https://github.com/salsify/jsonstreamingparser – Xymanek Mar 02 '17 at 20:42
  • Check out this: https://github.com/salsify/jsonstreamingparser – roman Mar 02 '17 at 20:43
  • Thanks guys! Looks like this might be what I'm looking for. Will have to try it later today. – Andri Mar 02 '17 at 20:56
  • I can't get the streaming parser to work. Keep getting errors, even if I use the examples on github. – Andri Mar 02 '17 at 21:53
  • @Igor I used your idea. Noticed that I could add a startdate (in milliseconds) and a count to the api. One of the fields for each object was a creation date in milliseconds, so I just set the count to 1000 and took the creation date from object number 1000 to set as my next startdate. Thanks! – Andri Mar 03 '17 at 18:58

1 Answers1

0

This solved my problem, although this might not be the solution in every case. Some ideas in the comments that might work for someone with more PHP knowledge than I have.

I used @Igor's idea (in comments). Noticed that I could add a startdate (in milliseconds) and a count to the API call. One of the fields for each object was a creation date in milliseconds, so I just set the count to 1000 and took the creation date from object number 1000 to set as my next startdate.

Something like this:

$lastTime = 0; // Use this as the startdate for my first API call
$i = 0; // Initialize counter to know when to stop the do-while loop
do {
  $i++;
  $api = 'domain.com/api?startdate=' . $lastTime . '&count=1000';
  // curl code here, returns $result
  foreach ($result as $obj) {
    // do whatever
    $lastTime = $obj->dateCreated; // This will give me the creation date for object no 1000, which I will use for my next API call.
  } 
}  while ($i >= 1000);
Andri
  • 453
  • 4
  • 22