-1

I have data as below in a CSV file. The green lines are what I'd like to keep.

Basically if someone has a single line, I want to keep it.

If they have multiple lines, I want to remove any where the 3rd column is A - Fully Fit.

After running through the whole file, I then want to save it over the original.

I tried writing the code but not sure if it's a Friday but the logic is escaping me at the moment.

There could be just a single line or there could be dozens. So if an ID has 1000 rows, I would just want to delete the rows for that ID where the 4th column is "A - Fully Fit"

UPDATE

This code works but not sure if it's the most optimal

enter image description here

// DECLARE ARRAYS
$a = Array();
$b = Array();
$c = Array();

// LOOP THROUGH FILE AND COLLECT DUPLICATES
if (($handle = fopen($filename, "r")) !== FALSE) {      

    while (($line = fgetcsv($handle, 4096, ",")) !== FALSE) {
        $a[$line[1]][] = 'SHAKKA';
    }

    foreach ($a as $k=>$v) {
        if (count($v)>1) {
            $b[] = $k;
        }
    }
}

// IF A DUPLICATE THEN REMOVE ROWS THAT ARE 'A - Fully Fit'
if (($handle = fopen($filename, "r")) !== FALSE) {      
    while (($golly = fgetcsv($handle, 4096, ",")) !== FALSE) {
        if (in_array($golly[1],$b) && $golly[3] == 'A - Fully Fit')  {
        }
        else {
            $c[] = $golly;
        }
    }
}
fclose($handle);

// WRITE THE FILE
$fp = fopen($filename, 'wa+');
foreach ($c as $fields) {
    fputcsv($fp, $fields);
}
fclose($fp);
pee2pee
  • 3,619
  • 7
  • 52
  • 133
  • The green lines have 3rd column as `A - Fully Fit` but you want to keep green lines, yet remove rows that have 3rd column as `A - Fully Fit`? – nice_dev Oct 11 '19 at 10:09
  • Sorry - my bad - have amended the image – pee2pee Oct 11 '19 at 10:11
  • 1
    Ok, so the question is every row needs to have unique ID(2nd column) and unique value in 4th column? – nice_dev Oct 11 '19 at 10:17
  • No - if an ID based off column 2 has multiple records (2 or more), then the record with the 4th column having "A - Fully Fit" needs to be deleted – pee2pee Oct 11 '19 at 10:19
  • 1
    "Basically if someone has a single line, I want to keep it.". I thought the combination of column 6 and 7 had to be unique. – Gerard Oct 11 '19 at 10:22
  • @pee2pee So the combination of 2nd column(ID) and 4th column needs to be unique? Like from `[16 - A - Fully Fit,16 - A - Fully Fit,18 - A - Fully Fit]`, only these `[16 - A - Fully Fit,18 - A - Fully Fit]` needs to survive? – nice_dev Oct 11 '19 at 10:27
  • The first line in the screenshot - that person only has one line, so I want to keep it. The next person has two lines. Based off the fact one of those has "A - Fully Fit" in there, I'll delete the second row and keep the third. Same for the 4th and 5th. They have two rows but since one has A - Fully Fit, I'll delete that and keep the other(s) – pee2pee Oct 11 '19 at 10:28
  • @pee2pee So, only unique values of 4th column needs to be present? If a row has a value in the 4th column already visited, delete them. Right? – nice_dev Oct 11 '19 at 10:30
  • No - sorry not sure how to explain it anymore but I'll give it a go. If your ID has 100 records in there, I just want to delete any record where A - Fully Fit is in the 4th column. If you have a single row, I would just leave it. – pee2pee Oct 11 '19 at 10:40
  • is your csv file always sorted by ID? – Casimir et Hippolyte Oct 11 '19 at 11:24

1 Answers1

1
<?php


if (($handle = fopen($filename, "r")) !== FALSE) {      
    $new_rows = [];
    $hash_ids = [];

    while (($line = fgetcsv($handle, 4096, ",")) !== FALSE) {
        if($line[3] === 'A - Fully Fit'){
            if(isset($hash_ids[$line[1]])) continue;
            $hash_ids[$line[1]] = true;
        }
        $new_rows[] = $line;
    }

    fclose($handle);

    // WRITE TO THE FILE
    $fp = fopen($filename, 'wa+');
    foreach ($new_rows as $current_row) {
        fputcsv($fp,$current_row);
    }
    fclose($fp);
}

In the above code, we maintain a hash(associative array) for each ID. If we have already come across with an ID with A - Fully Fit, then we skip the row, else we add it to the list.

nice_dev
  • 17,053
  • 2
  • 21
  • 35
  • Once the flag is set, it never resets so wouldn't that mean anyone else who is fully fit and has one row, they will be excluded? – pee2pee Oct 11 '19 at 11:22
  • @pee2pee So, only one occurrence of `A - Fully Fit` per ID? Is this the question? – nice_dev Oct 11 '19 at 11:23
  • No. Sorry, I can't explain it anymore simply than I have already done and with examples and I've now done the code myself. Thanks for trying to help. – pee2pee Oct 11 '19 at 11:31
  • If an ID has 1,000,000 rows, I would just want to delete the rows for that ID where the 4th column is "A - Fully Fit". If an ID had just one single row where the 4th column was "A - Fully Fit" I would want to keep it – pee2pee Oct 11 '19 at 11:35
  • @pee2pee I have edited my answer. Should work according to your needs. – nice_dev Oct 11 '19 at 12:11