0

i am working with a very large data set (786,432 rows to be precise).

So, to prevent memory limits I want to loop over the data set in piles of 50,000 rows, so to test this out I thought I would try:

function test(){
    $start = 0;
    $end = 50000;

    $q = $this->db->select('uuid')->from('userRegionLink')->limit($end, $start)->get();
    $i = 0;
    while($q->num_rows() != 0){
        echo 'Round: '.++$i.'<br />';
        echo 'Rows: '.$q->num_rows().'<br />';
        echo 'Start: '.$start.'<br />';
        echo 'End: '.$end.'<hr />';

        $start = $end;
        $end = $end+50000;
        $q = $this->db->select('uuid')->from('userRegionLink')->limit($end, $start)->get();
    }
}

But my results are very strange: look at round 9 and below.

What is causing this?

Round: 1
Rows: 50000
Start: 0
End: 50000


Round: 2
Rows: 100000
Start: 50000
End: 100000
Round: 3
Rows: 150000
Start: 100000
End: 150000
Round: 4
Rows: 200000
Start: 150000
End: 200000
Round: 5
Rows: 250000
Start: 200000
End: 250000
Round: 6
Rows: 300000
Start: 250000
End: 300000
Round: 7
Rows: 350000
Start: 300000
End: 350000
Round: 8
Rows: 400000
Start: 350000
End: 400000
Round: 9
Rows: 386432
Start: 400000
End: 450000
Round: 10
Rows: 336432
Start: 450000
End: 500000
Round: 11
Rows: 286432
Start: 500000
End: 550000
Round: 12
Rows: 236432
Start: 550000
End: 600000
Round: 13
Rows: 186432
Start: 600000
End: 650000
Round: 14
Rows: 136432
Start: 650000
End: 700000
Round: 15
Rows: 86432
Start: 700000
End: 750000
Round: 16
Rows: 36432
Start: 750000
End: 800000
Hailwood
  • 89,623
  • 107
  • 270
  • 423
  • Garbage collection? Do you **really** want PHP keeping 800k rows in memory? – AJ. Jun 01 '11 at 02:02
  • Actually, thats a point, the whole thing is screwy, Rows should be 50,000 for every loop except the last, whats happening here? – Hailwood Jun 01 '11 at 02:05
  • Rows is ***increasing*** by 50k on each iteration, tracking the total rows in the array used by `$q`. At some point, PHP is deciding to remove some of the array values though. I'd be interested in whether it's removing them sequentially from the "front" (i.e. FIFO) or if it's doing it in a less deterministic way. I'm betting the latter :) – AJ. Jun 01 '11 at 02:09
  • :D logic error on my part, For some reason I had decided that I also needed to increase the limit as well as the offset :P – Hailwood Jun 01 '11 at 02:13

1 Answers1

1

It looks like $end is not a global offset just number of records to fetch (offset from $start). Try to set $end always for 50000 and changing only $start.

Arek Jablonski
  • 349
  • 1
  • 7