3

I have a csv File which is of course comma separated and want to delete all the columns which have a specific title for example:

voltage, current, power, voltage, current, power
      2,       3,     6,      12,      12,   144  
      3,       5,    15,      10,      10,   100  

should be

voltage, power, voltage, power  
      2,     6,      12,   144  
      3,    15,      10,   100  
FE RY
  • 65
  • 7

4 Answers4

2

Contents of script.awk:

BEGIN {
    FS=" , "
}

NR==1 {
    for (i=1; i<=NF; i++) {
        if ($i == "current") {
            array[i]++
        }
        else {
            printf (i != NF) ? $i", " : $i
        }
    }
}

NR>=2 {
    for (j=1; j<=NF; j++) {
        if (!(j in array)) {
            printf (j != NF) ? $j", " : $j
        }
    }
}

{
    printf "\n"
}

Contents of file.txt:

voltage , current , power , voltage , current , power
2 , 3 , 6 , 12 , 12 , 144
3 , 5 , 15 , 10 , 10 , 100

Run like:

awk -f script.awk file.txt

Results:

voltage, power, voltage, power
2, 6, 12, 144
3, 15, 10, 100
Steve
  • 51,466
  • 13
  • 89
  • 103
  • i don't know what's wrong but this gives me the same output as input files – FE RY Sep 13 '12 at 13:24
  • @FERY: Did you test this on the input file I describe? What version of `awk` are you using? – Steve Sep 13 '12 at 13:26
  • @FERY: I should say that your delimiter is inconsistent within the input file you describe. I have assumed that your input is separated by: `" , "`. If necessary, change the value of `FS` in the `BEGIN` block. – Steve Sep 13 '12 at 13:33
  • i am actually using Gawk which is awk for windows.i run the whole thing using windows prompt window.and the problem is not the delimiter coz its a csv file and csv files are comma deparated – FE RY Sep 13 '12 at 13:42
  • 2
    @FERY: `gawk` is GNU AWK which is available for Windows but comes from the Linux world. I wouldn't call it "AWK for Windows". You should know that the idea of a CSV file is very nebulous. The example data you provided includes spaces around the commas on the data lines, but only *after* the commas in the header line. Steve added spaces before the commas in his version of the header line and then took the delimiter literally at face value as space-comma-space. These are important details. – Dennis Williamson Sep 13 '12 at 14:16
  • thank u guys it worked i was messing with the spaces in the file thats why it didnt work.But if i want to include all the headlines in one condition how would the code be ? like when the headline is CURRENT or VOLTAGE delete the columns – FE RY Sep 13 '12 at 15:33
  • @FERY: Try changing `if ($i == "current")` to `if ($i == "current" || $i == "voltage")`. HTH. – Steve Sep 13 '12 at 22:29
1

if you want to remove all columns titled with "current":

kent$  awk -F',' 'NR==1{for(x=1;x<=NF;x++)if($x!="current")l[x]++;}{for(i=1;i<=NF;i++)if(i in l)printf (i==NF)?$i"":$i", ";printf "\n"}' test.txt 

voltage, power, voltage, power
2, 6, 12, 144
3, 15, 10, 100

note: I removed the spaces in test.txt above.

  • if you had different titles need to be removed. you could try with Regex, and replace the $x!="foo" to regex match.

  • if you know the rule to remove columns, for example 2nd column, 5th, 8th ...(+3) it could be easier to be handled, by looping for example.

Kent
  • 189,393
  • 32
  • 233
  • 301
  • 1
    `l` (letter "l") is a poor choice for a variable name since it's hard to distinguish from `1` (numeral one). Also, some spaces would enhance readability. – Dennis Williamson Sep 13 '12 at 14:08
0

Assuming input file name is input.txt

awk  -F ',' '{print $1 "," $3 "," $5 "," $6 }' input.txt
user1203650
  • 300
  • 2
  • 3
  • the thing is the number of columns are too high(2500),so i need a loop to go through all the columns – FE RY Sep 13 '12 at 09:25
0

Note that for general CSV file handling, a proper library should be used. If the data are very simple, i.e. no embedded commas, newlines etc. simpler tools can be used.

You have a good awk solution from steve so I'll add an answer based on coreutils and grep:

# find columns to remove
pattern=current
cols=$(head -n1 a.csv | tr ',' '\n' | grep -n "$pattern" | cut -d: -f1 | paste -s -d,)

# remove all columns that matched
cut --complement -d, -f$cols a.csv

Output:

voltage, power, voltage, power
2 , 6 , 12 , 144
3 , 15 , 10 , 100

Note that the --complement option is a GNU cut extension. To generate $cols for other cuts, something like this should do (tested in zsh on FreeBSD):

# number of columns
file=a.csv
pattern=current
n=$(head -n1 "$file" | tr ',' '\n' | wc -l)

# generate complementary list
cols=$(jot $n \
| grep -xvFf <(head -n1 "$file" | tr ',' '\n' | grep -n "$pattern" | cut -d: -f1) \
| paste -s -d, -)

# remove columns
cut -d, -f$cols "$file"
Thor
  • 45,082
  • 11
  • 119
  • 130