2

I have a CSV file like this:

"Player","VPIP"
"$GIU$37","21.01"
"$VaSko$.017","14.11"
"*Lampiaao*","16.15"
"111bellbird","30.30"
"1221Vik1221","21.97"
"16266+626","20.83"
"17victor","16.09"
"1980locky","11.49"
"19dem","22.81"
"1lllllllllll","20.99"
......

and I would like to print lines like the following in (g)AWK, into an output file with extracting informations from between the quotes:

<note player="player ID comes here" label="a number between 0-7 based on the number belong to the player's VPIP value" update="this one is irrelevant, one nuber can be used for all the lines"></note>

So a printed line would look like this:

<note player="17victor" label="2" update="1435260533"></note>

Obviously, I'd like to ignore the first line of the CSV file when reading it because it contains only the header data. The label number criteria is:

0: VPIP > 37.5

1: VPIP < 10

2: VPIP between 10 - 16.5

7: The rest.

Any idea about how it can be done?

sasieightynine
  • 434
  • 1
  • 5
  • 16

1 Answers1

4

Script

Try this script:

BEGIN {
    FS = ","
    update = 34513135
}
NR != 1 {
    vpip = $2 
    gsub(/"/, "", vpip)

    if (vpip > 37.5)
        label = 0
    else if (vpip < 10)
        label = 1
    else if (vpip < 16.5)
        label = 2
    else
        label = 7
    printf "<note player=%s label=%s update=%s></note>\n", $1, label, update
}

Explanation

It's quite simple really:

  1. First we set the field separator to a comma so the input file has two fields per line (i.e. a record).
  2. We also set the update variable in the BEGIN block (this gets executed before parsing the file.
  3. Then we execute the next code for every line that is not the first one NR != 1.
  4. Here we set the second field as the vpip and remove the quotes to compare it to an integer.
  5. Now we take the VPIP and map it to the correct label, we store this in label.
  6. Finally print the line in the desired format.

Usage

To use this code you should execute awk -f script.awk file. Where script.awk is the name of the script and file is the path to the input file.

Example usage:

$ cat file
"Player","VPIP"
"$GIU$37","21.01"
"$VaSko$.017","14.11"
"*Lampiaao*","16.15"
"111bellbird","30.30"
"1221Vik1221","21.97"
"16266+626","20.83"
"17victor","16.09"
"1980locky","11.49"
"19dem","22.81"
"1lllllllllll","20.99"
$ awk -f script.awk file 
<note player="$GIU$37" label=7 update=34513135></note>
<note player="$VaSko$.017" label=2 update=34513135></note>
<note player="*Lampiaao*" label=2 update=34513135></note>
<note player="111bellbird" label=7 update=34513135></note>
<note player="1221Vik1221" label=7 update=34513135></note>
<note player="16266+626" label=7 update=34513135></note>
<note player="17victor" label=2 update=34513135></note>
<note player="1980locky" label=2 update=34513135></note>
<note player="19dem" label=7 update=34513135></note>
<note player="1lllllllllll" label=7 update=34513135></note>

Notes

If you have further questions, leave a comment and I will elaborate.

ShellFish
  • 4,351
  • 1
  • 20
  • 33
  • I needed to change if ($2 > 37.5) to if (vpip > 37.5) and do this with the rest of the code, now it's working like a charm, thank you very much for your help, this might be very very useful for me in the future :) – sasieightynine Jul 03 '15 at 03:24
  • 1
    @sasieightynine Of course, good catch, shows you understand the code :) Don't forget to accept the answer if it solved your questions and upvote if you deem it a quality answer. Also `man awk` could be helpful to you! – ShellFish Jul 03 '15 at 10:33