5

I have a Silverstripe site that deals with very big data. I made an API that returns a very large dump, and I call that API at the front-end by ajax get.

When ajax calling the API, it will take 10 mins for data to return (very long json data and customer accepted that).

While they are waiting for the data return, they open the same site in another tab to do other things, but the site is very slow until the previous ajax request is finished.

Is there anything I can do to avoid everything going unresponsive while waiting for big json data?

Here's the code and an explanation of what it does:

I created a method named geteverything that resides on the web server as below, it accessesses another server (data server) to get data via streaming API (sitting in data server). There's a lot of data, and the data server is slow; my customer doesn't mind the request taking long, they mind how slow everything else becomes. Sessions are used to determine particulars of the request.

protected function geteverything($http, $id) {
    if(($System = DataObject::get_by_id('ESM_System', $id))) {
        if(isset($_GET['AAA']) && isset($_GET['BBB']) && isset($_GET['CCC']) && isset($_GET['DDD'])) {
            /**
              --some condition check and data format for AAA BBB CCC and DDD goes here
            **/
            $request = "http://dataserver/streaming?method=xxx";
            set_time_limit(120);
            $jsonstring = file_get_contents($request);
            echo($jsonstring);
        }
    }
}

How can I fix this, or what else would you need to know in order to help?

Tim Post
  • 33,371
  • 15
  • 110
  • 174
Phuong Le
  • 377
  • 5
  • 16
  • You'll need to provide some code before anyone can provide any help. SilverStripe's ORM can be pretty heavy/slow when accessing large datasets depending on how you are doing it. Again without any code no one can help you. – Stephen Jul 23 '15 at 02:33
  • Thanks Stephen, I added some code as suggested. The API is very simple, just relay from other server before end up at the client browser. Just the data itself is very big. Thanks – Phuong Le Jul 23 '15 at 03:37
  • 2
    How big is `$jsonstring` once `file_get_contents()` is finished? It might be worth reading that in smaller (say, 512 character) chunks, writing to the browser as you go, so you're never storing the _entire_ dump in memory. Also, what's the character set of the data? Do you have to contend with multi-byte chars? – Tim Post Jul 23 '15 at 03:41
  • 1
    Thanks for fix my English Tim :) – Phuong Le Jul 23 '15 at 03:53
  • the jsonstring is like this: [ { "224": "2.1302e+17", "Timestamp": "2014-08-26 00:00:10" }, { "224": "2.1302e+17", "Timestamp": "2014-08-26 01:00:10" }, { "224": "2.1302e+17", "Timestamp": "2014-08-26 02:00:10" } ] and it can go for whole year of every 10 seconds log... – Phuong Le Jul 23 '15 at 03:53
  • 2
    Right, but how _big_ is it? E.g. `strlen($jsonstring)` - I'm curious how much RAM `$jsonstring` occupies once `file_get_contents()` finishes. It might be worth using `fsockopen()`, reading only 1k at a time, and writing it to the browser in a loop as you go. Currently, you store _all_ of it in memory, then write it to the browser, which might be a source of your slowdown. – Tim Post Jul 23 '15 at 03:55
  • Thanks for the indication Tim. That is the best idea I got so far, I will check it out and comment tomorrow. Thanks very much everyone – Phuong Le Jul 23 '15 at 04:05
  • 1
    How often does the data change, could you fetch it daily and cache it? Or, during say 8am and 5pm have a process that fetches it every 15 minutes and keeps it's self up to date? Maybe limit how much you are returning, get the latest 50 lines, if they want more get another 50 lines etc. – Wizzard Jul 23 '15 at 04:37
  • > "I made an API that returns a very large dump" Is that fetching the data from somewhere else? – Wizzard Jul 23 '15 at 04:39
  • 1
    @Wizzard, see code above, yes, it's fetching a large stream from another service and returns it. One could surely link directly to that stream, unless this stream isn't reachable from outside or this programm must act as a proxy. – wmk Jul 23 '15 at 06:16
  • @wmk - wasn't clear because it sounds like he's dumping some data objects, but then code has a file_get_contents.. – Wizzard Jul 23 '15 at 10:40
  • Hi Wizzard and wmk, data object is collected every 10 sec. In the data object, it can be 100 of data point depend on the system. Its been collecting 24/7 and been running for 5 years plus. Now I'm making a graphing engine that can graph their historical data over a specific period for analysis purpose so pulling data out from the data server with a very fine granularity so you can image data cant be caching and can be very large so dumping to memory as I did wasnt a good solution. I havent got time to try chunking data yet....hic – Phuong Le Jul 24 '15 at 06:09
  • Yep, you are right wmk, this program is act as proxy and the data server cant be reach from outside. Thanks – Phuong Le Jul 24 '15 at 06:11
  • The request string is actually very long, it's an API developed not by me. My code is just get some parameters from my sever, format it and parse it back to the request API string so the API (reside in data server) can return the hugeeee data for me and I dump it back to client browser to put on javascript graph. – Phuong Le Jul 24 '15 at 06:17
  • @PhuongLe How often does the data change? Can it be cached for any period of time? Could you use a cron to update it once a day then use the cache the rest of the time. – Wizzard Jul 24 '15 at 11:33
  • @PhuongLe ^ see above – Wizzard Jul 25 '15 at 23:47

1 Answers1

3

The reason it's taking so long is your downloading the entirity of the json to your server THEN sending it all to the user. There's no need to wait for you to get the whole file before you start sending it.

Rather than using file_get_contents make the connection with curl and write the output directly to php://output.

For example, this script will copy http://example.com/ exactly as is:

<?php

    // Initialise cURL. You can specify the URL in curl_setopt instead if you prefer
    $ch = curl_init("http://example.com/");

    // Open a file handler to PHP's output stream
    $fp = fopen('php://output', 'w');    

    // Turn off headers, we don't care about them
    curl_setopt($ch, CURLOPT_HEADER, 0);

    // Tell curl to write the response to the stream
    curl_setopt($ch, CURLOPT_FILE, $fp);

    // Make the request
    curl_exec($ch);

    // close resources
    curl_close($ch);
    fclose($fp);
DanielM
  • 6,380
  • 2
  • 38
  • 57