1

I have a csv file which have about 500000 number of rows. What I need to do is take first 100 rows in first loop and manipulate the rows (say, send first 100 IDs to API and save response). In Second loop, skip the first 100 rows(already taken) and take another 100 rows and send request to web service. Similarly, in third loop, skip first 200 rows and take another 100 rows and send request to web service and so on...

I can take single each rows with below code. (tested : works great)

if (($handle = fopen($filename, "r")) !== FALSE) {
    $id   = 1;
    $line = fgetcsv($handle); //skip first row

    //fetch data from each row
    while (($data = fgetcsv($handle, ",")) !== FALSE) {
        $hotel_id   = $data[0];
        //call service to request to web service
        $hotelDetailRequest = (new \Services\Hotel\Hotel)->getHotelStaticData($hotel_id);
        //do stuff to response
    }
}

Similarly, I can skip some initial rows as like I skipped first row adding

$line = fgetcsv($handle);
$line = fgetcsv($handle);
$line = fgetcsv($handle);

But, this is not my expected result as explained above. I am using PHP(Laravel). I googled, but could not found any suitable that match my criteria. Has anyone face the same problem?

Any help would be appreciated. Thank You

VijayRana
  • 953
  • 1
  • 13
  • 38
  • Why do you need to do this? Are you concerned with the frequency at which the requests are sent, or is it a performance issue, memory related? Can you elaborate on why you need to do this so we can better understanding the underlying problems and perhaps provide a solution. – Ohgodwhy May 25 '16 at 06:44
  • Yes, I am concerned with the frequency at which the requests are sent. So, I am running the query after midnight and before morning. I am actually saving static data to my database as documented in web service I am doing. – VijayRana May 25 '16 at 06:47

2 Answers2

2

here is a solution for you:

<?php
$filename = "the_data.csv";
$chunk_size = 200;

// parse csv file into an array
$csv_data = array_map('str_getcsv', file($filename));

// split data array into chunks
$chunked_data = array_chunk($csv_data, $chunk_size);

foreach($chunked_data as $chunk){

    // here you have $chunk_size row data
    // iterate in chunk
    foreach($chunk as $row ){
        $hotel_id   = $row[0];

        // send request to web service
        // do stuff to response
    }

    sleep(1);
}
?>
talhasch
  • 357
  • 4
  • 6
  • code could not run after this line: `$csv_data = array_map('str_getcsv', file($filename));` – VijayRana May 25 '16 at 07:07
  • It says : `The mydomain.loc page isn’t working mydomain.loc is currently unable to handle this request. HTTP ERROR 500`. – VijayRana May 25 '16 at 07:22
  • you better run this code in command line php. running this type of scripts in web server is inefficient. – talhasch May 25 '16 at 07:28
0

You can use the SplFileObject combined with ->seek($start). We can make reference to the limit/offset relationship that you would use with a normal MySQL database; here's an example:

$file = SplFileObject("myfile.csv");

$rules = DB::query("select * from file_rules where id = 1");

if ($rules->limit) {
    $file->seek($rules->offset);
}

At this point, you can just do a simple loop and compare the index count against your offset value.

foreach ($file as $index => $row ) {
    if ($file->valid() && $index <= $rules->limit ) {
        //perform your calls to do your API request
    }
}

When you're done, simply update your DB records:

DB::query('update file_rules set offset='.$rules->offset+$rules->limit.' where id =1');

This is the jist of it. Replace my DB psuedo code with your own. This will allow you to execute it as a CRON and use the database as an interaction point to get your limit/offset and apply it to your loop with a seek to further save memory.

Ohgodwhy
  • 49,779
  • 11
  • 80
  • 110