1

I'm automating some data that gets generated in .csv format and I'm stuck when it comes to formatting the fields with data values. Kindly help me with this.

I have the following data in a .csv file:

April 3 2016    FL  03112017    0
April 4 2016    CA  04022016    0 
April 5 2016    TX  04302016    0 
April 6 2016    OH  05292016    0 
April 7 2016    AZ  06092016    0 
April 8 2016    MA  06252016    0

I would like to create the file as follow:

2016-04-03  FL  03112017    0
2016-04-04  CA  04022016    0
2016-04-05  TX  04302016    0
2016-04-06  OH  05292016    0
2016-04-07  AZ  06092016    0
2016-04-08  MA  06252016    0

I have tried it using awk, with something like this:

awk -F "\t" '{print "date -d "$1" +'%Y-%m-%d'",$2,$3}' file.csv > file1.txt

That doesn't work. I'm not able to assign a resultant/calculated value to the $1 that has to be printed.

Jonathan
  • 144
  • 4
  • 13
  • Possible duplicate of [Converting dates in AWK](http://stackoverflow.com/questions/2121896/converting-dates-in-awk) – bufh Apr 11 '16 at 13:29
  • No, it's not a dup of that since this sample input contains the year. Also, the answer selected in that question isn't great since the first script in it unnecessarily spawns a subshell and calls a shell tool for every line in the file when all it had to do was figure out the year once and then print it when shuffling the text on every line and the 2nd answer uses `getline` incorrectly (no test/protection for failure). – Ed Morton Apr 11 '16 at 13:57

2 Answers2

2

No date functions required, just shuffling of the text:

$ awk '
     BEGIN{m="JanFebMarAprMayJunJulAugSepOctNovDec"}
     {printf "%04d-%02d-%02d %s %s %s\n",$3,(match(m,substr($1,1,3))+2)/3,$2,$4,$5,$6}
  ' file
2016-04-03 FL 03112017 0
2016-04-04 CA 04022016 0
2016-04-05 TX 04302016 0
2016-04-06 OH 05292016 0
2016-04-07 AZ 06092016 0
2016-04-08 MA 06252016 0
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    Thanks @Ed Morton. This works. Can you please let me know if I can use date formatting(using the date command like in the code i had mentioned) within 'awk' command? This would help me with other operations that i need to do. – Jonathan Apr 11 '16 at 14:03
  • You're welcome. Yes you can but if you have GNU awk then you don't have to since [gawk has it's own time functions](http://www.gnu.org/software/gawk/manual/gawk.html#Time-Functions). Judging by use of the `-d` option you're using GNU date so you should be using GNU awk too. – Ed Morton Apr 11 '16 at 14:04
  • Hey, this code is useful, but i'm having some difficulties with the delimiter while using this. The input file is tab delimited and the output file has to be tab delimited too. I tried different variations of the code to generate a tab delimited file, but i'm not able to. Can you please help me on this. – Jonathan Apr 11 '16 at 15:29
  • To modify the output spacing just change the blank chars in the printf formatting string to `\t`s: `printf "%04d-%02d-%02d\t%s\t%s\t%s\n"`. The fact that your input is tab separated is irrelevant since you don't have empty fields or blank chars within fields in your input. – Ed Morton Apr 11 '16 at 15:32
  • @ Jonathan You can use `paste` and the `date` command with process substitution, this is probably a better route than `awk` if you have a variety of `date` formats to consider: `paste <(cat d.txt | cut -d ' ' -f -3 | xargs -n3 -I {} date -d"{}" +%Y-%m-%d) <(cat d.txt | cut -d' ' -f 7-)` –  Apr 11 '16 at 23:09
  • @A.Danischewski you should post that as an answer so it can receive the appropriate attention. – Ed Morton Apr 11 '16 at 23:48
2

Here, I'm using date to generate the month names: not strictly necessary.

awk -v monthNames="$(for m in $(seq 1 12); do date -d "$m/1" +%B; done | paste -sd,)" '
    BEGIN {
        split(monthNames, mn, ",")
        for (i=1; i<=12; i++) m[mn[i]] = i
        delete mn
        FS = OFS = "\t"
    }
    {
        split($1, d, " ")
        $1 = sprintf("%d-%02d-%02d", d[3], m[d[1]], d[2])
        print
    }
' file.csv
glenn jackman
  • 238,783
  • 38
  • 220
  • 352