2

How would I sort the following CSV file using PHP? I would like to sort by last name. Do I use regex somehow to get the first letter in the last name? Any help is appreciated

Here is an excerpt of my CSV file - with a ";" delimiter between names and addresses

John C. Buckley, M.D.;123 Main Street, Bethesda MD 20816
Steven P. Wood;345 Elm Street, Rockville, MD 20808
Richard E. Barr-Fernandez;234 Pine Street, Takoma Park MD 20820
Charles Andrew Powell; 678 Oak Street, Gaithersburg MD 20800
Mark David Horowitz, III; 987 Wall Street, Silver Spring MD 20856
Thomas
  • 1,571
  • 3
  • 12
  • 12
  • WTF .. it isn't letting my answer.... – Tyler Carter Aug 13 '09 at 15:03
  • It varies - some are over 2000 – Thomas Aug 13 '09 at 15:03
  • 2
    Is that a genuine excerpt with genuine names and addresses? You really shouldn't be posting other people's details on the internet... – Waggers Aug 13 '09 at 15:15
  • Waggers - they are fake. I agree with you. I wouldn't post real details. – Thomas Aug 13 '09 at 15:37
  • Thomas, I don't think you can extract the last name with 100% accuracy, unless you're willing to do a *lot* of work. You'll probably have to decide what "good enough" is for you. See http://stackoverflow.com/questions/103422/simple-way-to-parse-a-persons-name-into-its-component-parts – bmb Aug 13 '09 at 15:38

3 Answers3

3

Here is my attempt. I'm not sure how robust the regex is to extract the surname though.

<?php
$handle = fopen('c:/csv.txt', 'r') or die('cannot read file');
$surnames = array();
$rows = array();

//build array of surnames, and array of rows
while (false != ( $row = fgetcsv($handle, 0, ';') )) {
    //extract surname from the first column
    //this should match the last word before the comma, if there is a comma
    preg_match('~([^\s]+)(?:,.*)?$~', $row[0], $m);
    $surnames[] = $m[1];
    $rows[] = $row;
}

fclose($handle);

//sort array of rows by surname using our array of surnames
array_multisort($surnames, $rows);

print_r($rows);
//you could write $rows back to a file here if you wanted.

Edit

I just realised that you don't really need to strip off the peoples' suffixes, because this probably won't really affect sorting. You could just split the first column on space and take the last one (as karim79 suggested). This might break though if the person has more suffixes with spaces, so I'll leave my answer intact.

Tom Haigh
  • 57,217
  • 21
  • 114
  • 142
  • +1 @Tom Haigh, I was coding something like that, maybe just split at the space ' ' and take the last offset of the resulting array? As in $name = split(" ", $name);$lastName = $name[count($name) - 1]; – karim79 Aug 13 '09 at 15:21
  • Tom, this works great. Real dumb question - how can I simply print or echo the rows out to the page. Thanks! – Thomas Aug 13 '09 at 15:39
  • print_r() should already do that. You could print to a table if you wanted, it should be a simple case of using foreach to go over the array and printing the value of each 'sub' array. – Tom Haigh Aug 13 '09 at 15:49
2

Well, because it is a CSV

$lines = file($file);
foreach($lines as $line)
{
    $parts = explode(";", $line);
    $last = explode(" ", $parts[0]);
    $last = end($last);
    $array[$last] = $parts;
}

ksort($array);

// ..... write it back to file
Tyler Carter
  • 60,743
  • 20
  • 130
  • 150
  • I think the csv data is actually separated by semi-colons, it looks like commas are used to split address lines and also to add name suffixes – Tom Haigh Aug 13 '09 at 15:21
  • This looks promising. I changed the separator in you first explode function to a semi-colon. However, the array still gets confused when it picks up the last part of the exploded name because some names end with an "MD" or "III" or "Jr." – Thomas Aug 13 '09 at 15:29
  • In the example the M.D was after the semi-colon, but yeah, not exactly sure how you would fix that. I could just use the 3rd value, but if someone has multiple middle names or a space in their first or middle name it would throw it off. – Tyler Carter Aug 13 '09 at 15:33
  • This is harder than it seems. See http://stackoverflow.com/questions/103422/simple-way-to-parse-a-persons-name-into-its-component-parts – bmb Aug 13 '09 at 15:40
  • Slight tweak on first line in foreach loop: I replaced `$parts = explode(";", $line)` with `$parts = str_getcsv($line, ',', '"')` because my csv had commas in quotes that I wanted to ignore. See https://stackoverflow.com/a/5132572/4504073. – cuka Jun 30 '21 at 14:01
1

The PHP function array_multisort should do what you want. You'll need to read the CSV into a multi-dimensional array - see the third example on the linked page.

DisgruntledGoat
  • 70,219
  • 68
  • 205
  • 290
  • How about finding the last name? – bmb Aug 13 '09 at 15:11
  • Guess I missed that part, my bad. You've already accepted an answer but the way I would do it is split the name on a space, take the last element and add it to the "row" as a new element. Then sort on that field. – DisgruntledGoat Aug 14 '09 at 10:22