0

We are developing an API with LUMEN. Today we had a confused problem with getting the collection of our "TimeLog"-model. We just wanted to get all time logs with additional informationen from the board model and task model. In one row of time log we had a board_id and a task_id. It is a 1:1 relation on both.

This was our first code for getting the whole data. This took a lot of time and sometimes we got a timeout: BillingController.php

public function byYear() {

       $timeLog = TimeLog::get(); 

        $resp = array(); 

        foreach($timeLog->toArray() as $key => $value) {  

            if(($timeLog[$key]->board_id && $timeLog[$key]->task_id) > 0 ) {      

                 array_push($resp, array(
                    'board_title' => isset($timeLog[$key]->board->title) ? $timeLog[$key]->board->title : null,
                    'task_title' => isset($timeLog[$key]->task->title) ? $timeLog[$key]->task->title : null,
                    'id' => $timeLog[$key]->id
                )); 
            }
        }


        return response()->json($resp);
    }   

The TimeLog.php where the relation has been made.

public function board()
        {
            return $this->belongsTo('App\Board', 'board_id',  'id');
        }

        public function task()
        {
            return $this->belongsTo('App\Task', 'task_id',  'id');
        }

Our new way is like this: BillingController.php

 public function byYear() {



            $timeLog = TimeLog::
join('oc_boards', 'oc_boards.id', '=', 'oc_time_logs.board_id')
                            ->join('oc_tasks', 'oc_tasks.id', '=', 'oc_time_logs.task_id')
                            ->join('oc_users', 'oc_users.id', '=', 'oc_time_logs.user_id')
                            ->select('oc_boards.title AS board_title', 'oc_tasks.title AS task_title','oc_time_logs.id','oc_time_logs.time_used_sec','oc_users.id AS user_id')
                            ->getQuery()
                            ->get(); 

            return response()->json($timeLog);
        }   

We deleted the relation in TimeLog.php, cause we don't need it anymore. Now we have a load time about 1 sec, which is fine! There are about 20k entries in the time log table.

My questions are:

  1. Why is the first method out of range (what causes the timeout?)
  2. What does getQuery(); exactly do?

If you need more information just ask me.

yfain
  • 509
  • 7
  • 23
  • I say: use https://github.com/barryvdh/laravel-debugbar or https://github.com/recca0120/laravel-tracy (or any other tool that can show you similar info) -- they will show you what SQL commands were executed. I bet that in first version you will have hundreds if not thousands of SQL requests while in 2nd version it might be only one. As to way so many queries -- see https://laravel.com/docs/5.4/eloquent-relationships#eager-loading – LazyOne Jul 26 '17 at 10:30
  • Thanks! - We got it now. Do you know what getQuery(); is exactly doing? – yfain Jul 26 '17 at 10:56
  • 1
    Possibly nothing in your case ("Get the underlying query builder instance" -- which may work even if you omit it) .. or simply "Get the underlying query for the relation". In short -- I cannot correctly answer that (plus, for complex queries I prefer raw SQL -- easier to edit/debug in SQL editor etc). – LazyOne Jul 26 '17 at 11:31

2 Answers2

2

--First Question--

One of the issues you might be facing is having all those huge amount of data in memory, i.e:

$timeLog = TimeLog::get();

This is already enormous. Then when you are trying to convert the collection to array:

  1. There is a loop through the collection.
  2. Using the $timeLog->toArray() while initializing the loop based on my understanding is not efficient (I might not be entirely correct about this though)
  3. Thousands of queries are made to retrieve the related models

So what I would propose are five methods (one which saves you from hundreds of query), and the last which is efficient in returning the result as customized:

  1. Since you have many data, then chunk the result ref: Laravel chunk so you have this instead:

    $timeLog = TimeLog::chunk(1000, function($logs){
        foreach ($logs as $log) {
        // Do the stuff here
        }
    }); 
    
  2. Other way is using cursor (runs only one query where the conditions match) the internal operation of cursor as understood is using Generators.

    foreach (TimeLog::where([['board_id','>',0],['task_id', '>', 0]])->cursor() as $timelog) {
      //do the other stuffs here
    }
    
  3. This looks like the first but instead you have already narrowed your query down to what you need:

    TimeLog::where([['board_id','>',0],['task_id', '>', 0]])->get()
    
  4. Eager Loading would already present the relationship you need on the fly but might lead to more data in memory too. So possibly the chunk method would make things more easier to manage (even though you eagerload related models)

    TimeLog::with(['board','task'],  function ($query) {
        $query->where([['board_id','>',0],['task_id', '>', 0]]);
    }])->get();
    
  5. You can simply use Transformer

    • With transformer, you can load related model, in elegant, clean and more controlled methods even if the size is huge, and one greater benefit is you can transform the result without having to worry about how to loop round it You can simply refer to this answer in order to perform a simple use of it. However incase you don't need to transform your response then you can take other options.

Although this might not entirely solve the problem, but because the main issues you face is based on memory management, so the above methods should be useful.

--Second question--

Based on Laravel API here You could see that:

enter image description here

It simply returns the underlying query builder instance. To my observation, it is not needed based on your example.

UPDATE

For question 1, since it seems you want to simply return the result as response, truthfully, its more efficient to paginate this result. Laravel offers pagination The easiest of which is SimplePaginate which is good. The only thing is that it makes some few more queries on the database, but keeps a check on the last index; I guess it uses cursor as well but not sure. I guess finally this might be more ideal, having:

return TimeLog::paginate(1000);
1

I have faced a similar problem. The main issue here is that Elloquent is really slow doing massive task cause it fetch all the results at the same time so the short answer would be to fetch it row by row using PDO fetch.

Short example:

$db = DB::connection()->getPdo();

$query_sql = TimeLog::join('oc_boards', 'oc_boards.id', '=', 'oc_time_logs.board_id')
                            ->join('oc_tasks', 'oc_tasks.id', '=', 'oc_time_logs.task_id')
                            ->join('oc_users', 'oc_users.id', '=', 'oc_time_logs.user_id')
                            ->select('oc_boards.title AS board_title', 'oc_tasks.title AS task_title','oc_time_logs.id','oc_time_logs.time_used_sec','oc_users.id AS user_id')
                            ->toSql();

$query = $db->prepare($query->sql);
$query->execute();
$logs = array();
 while ($log = $query->fetch()) {
   $log_filled = new TimeLog();
   //fill your model and push it into an array to parse it to json in future
   array_push($logs,$log_filled);
}
return response()->json($logs);
aaron0207
  • 2,293
  • 1
  • 17
  • 24
  • Thanks for your reply. Thanks also for the detailed information. Do you read our solution for it? - Is it the best way, or do you think your example is much better? – yfain Jul 26 '17 at 10:23
  • You are welcome. Yes, I readed it. In our scenario we were fetching ~400-500k rows from a big table and retrieving them as json too. While we were trying it with elloquent a simply `$objects = TableName::all(); return response()->json($objects)` bring down the server but using PDO fetch it serves the whole data in less than 2s – aaron0207 Jul 26 '17 at 10:29
  • @aaron0207 Why don't you use chunk(), is it causing more overhead? – Duke Apr 11 '19 at 10:39
  • You've got the right idea with doing the joins up front. However it is misleading to say the Eloquent way was slow because it fetches the data all up front, while this way is one at a time. The entire result set is always stored in memory (either in php-land or on the database). The reason your method works is because it DOES pull everything at once into php memory as opposed to the OPs query which pulled one table into memory then made a separate request for every join for every member. You also did not mention that the OP was unnecessarily tripling the array (yours only doubles) – sfscs May 19 '21 at 23:09