0

Good day,

I have a local csv file with values that change daily called DailyValues.csv
I need to extract the value field of category2 and category4.
Then combine, sort and remove duplicates (if any) from the extracted values.
Then save it to a new local file NewValues.txt.

Here is an example of the DailyValues.csv file:

category,date,value  
category1,2010-05-18,value01  
category1,2010-05-18,value02  
category1,2010-05-18,value03  
category1,2010-05-18,value04  
category1,2010-05-18,value05  
category1,2010-05-18,value06  
category1,2010-05-18,value07  
category2,2010-05-18,value08  
category2,2010-05-18,value09  
category2,2010-05-18,value10  
category2,2010-05-18,value11  
category2,2010-05-18,value12  
category2,2010-05-18,value13  
category2,2010-05-18,value14  
category2,2010-05-18,value30  
category3,2010-05-18,value16  
category3,2010-05-18,value17  
category3,2010-05-18,value18  
category3,2010-05-18,value19  
category3,2010-05-18,value20  
category3,2010-05-18,value21  
category3,2010-05-18,value22  
category3,2010-05-18,value23  
category3,2010-05-18,value24  
category4,2010-05-18,value25  
category4,2010-05-18,value26  
category4,2010-05-18,value10  
category4,2010-05-18,value28  
category4,2010-05-18,value11  
category4,2010-05-18,value30  
category2,2010-05-18,value31  
category2,2010-05-18,value32  
category2,2010-05-18,value33  
category2,2010-05-18,value34  
category2,2010-05-18,value35  
category2,2010-05-18,value07

I've found some helpful parsing examples at http://www.php.net/manual/en/function.fgetcsv.php and managed to extract all the values of the value column but don't know how to restrict it to only extract the values of category2/4 then sort and clean duplicate.

The solution needs to be in php, perl or shell script.

Any help would be much appreciated.
Thank you in advance.

Mitch Dempsey
  • 38,725
  • 6
  • 68
  • 74
Yallaa
  • 63
  • 1
  • 8

1 Answers1

0

Here's a shell script solution.

egrep 'category4|category2' input.file | cut -d"," -f1,3 | sort -u > output.file

I used the cut command just to show you that you can extract certain columns only, since the f switch for cut chooses, which columns you want to extract.

The u switch for sort makes the output to be unique.

Edit: It's important that you use egrep and not grep, since grep uses a somewhat restricted regular expression set, and egrep has somewhat further facilities

Edit (for people who only have grep available):

grep 'category2' input.file > temp.file && grep 'category4' input.file >> temp.file && cut temp.file -d"," -f1,3 | sort -u > output.file && rm temp.file

It produces quite an overhead but still works...

  • Thank you dare2be much appreciated. The `cut` portion works great alone (new to me), but when I use the full command with egrep to do the restriction it produces an empty file. – Yallaa May 19 '10 at 04:44
  • now that's weird. See, to check whether I copied it properly from terminal to SO, I copy-and-pasted it to terminal and it worked... Are you sure you have `egrep` installed? Check with `which egrep` –  May 19 '10 at 04:50
  • It is installed ` > which egrep /bin/egrep > ls -l /bin/egrep lrwxrwxrwx 1 root root 4 Mar 1 2008 /bin/egrep -> grep ` I tried both grep and egrep and same thing no output. – Yallaa May 19 '10 at 05:01
  • Haha you see, `egrep` is linked to `grep`, so you actually DON'T have `egrep` and the regular expression I posted won't work in `grep`. Try the non-egrep version I just posted. –  May 19 '10 at 05:04
  • Is egrep linked to grep or the other way around above? I actually have both actual files /bin/egrep /bin/grep. I tried your new example and it works, thanks a million. But then I went back and did the first one with egrep but used the full path with egrep like so /bin/egrep and it worked perfectly this time. So it is an issue with the env. path. – Yallaa May 19 '10 at 05:12