how to delete all columns containing a specific headline using AWK

Question

I have a csv File which is of course comma separated and want to delete all the columns which have a specific title for example:

voltage, current, power, voltage, current, power
      2,       3,     6,      12,      12,   144  
      3,       5,    15,      10,      10,   100

should be

voltage, power, voltage, power  
      2,     6,      12,   144  
      3,    15,      10,   100

are the titles you want to remove fixed always "current" or could be anything ? — Kent, Sep 13 '12 at 09:54

score 2 · Accepted Answer · answered Sep 13 '12 at 10:29

2

Contents of script.awk:

BEGIN {
    FS=" , "
}

NR==1 {
    for (i=1; i<=NF; i++) {
        if ($i == "current") {
            array[i]++
        }
        else {
            printf (i != NF) ? $i", " : $i
        }
    }
}

NR>=2 {
    for (j=1; j<=NF; j++) {
        if (!(j in array)) {
            printf (j != NF) ? $j", " : $j
        }
    }
}

{
    printf "\n"
}

Contents of file.txt:

voltage , current , power , voltage , current , power
2 , 3 , 6 , 12 , 12 , 144
3 , 5 , 15 , 10 , 10 , 100

Run like:

awk -f script.awk file.txt

Results:

voltage, power, voltage, power
2, 6, 12, 144
3, 15, 10, 100

answered Sep 13 '12 at 10:29

Steve

51,466
13
89
103

i don't know what's wrong but this gives me the same output as input files – FE RY Sep 13 '12 at 13:24
@FERY: Did you test this on the input file I describe? What version of `awk` are you using? – Steve Sep 13 '12 at 13:26
@FERY: I should say that your delimiter is inconsistent within the input file you describe. I have assumed that your input is separated by: `" , "`. If necessary, change the value of `FS` in the `BEGIN` block. – Steve Sep 13 '12 at 13:33
i am actually using Gawk which is awk for windows.i run the whole thing using windows prompt window.and the problem is not the delimiter coz its a csv file and csv files are comma deparated – FE RY Sep 13 '12 at 13:42
2

@FERY: `gawk` is GNU AWK which is available for Windows but comes from the Linux world. I wouldn't call it "AWK for Windows". You should know that the idea of a CSV file is very nebulous. The example data you provided includes spaces around the commas on the data lines, but only *after* the commas in the header line. Steve added spaces before the commas in his version of the header line and then took the delimiter literally at face value as space-comma-space. These are important details. – Dennis Williamson Sep 13 '12 at 14:16
thank u guys it worked i was messing with the spaces in the file thats why it didnt work.But if i want to include all the headlines in one condition how would the code be ? like when the headline is CURRENT or VOLTAGE delete the columns – FE RY Sep 13 '12 at 15:33
@FERY: Try changing `if ($i == "current")` to `if ($i == "current" || $i == "voltage")`. HTH. – Steve Sep 13 '12 at 22:29

score 1 · Answer 2 · answered Sep 13 '12 at 10:16

if you want to remove all columns titled with "current":

kent$  awk -F',' 'NR==1{for(x=1;x<=NF;x++)if($x!="current")l[x]++;}{for(i=1;i<=NF;i++)if(i in l)printf (i==NF)?$i"":$i", ";printf "\n"}' test.txt 

voltage, power, voltage, power
2, 6, 12, 144
3, 15, 10, 100

note: I removed the spaces in test.txt above.

if you had different titles need to be removed. you could try with Regex, and replace the $x!="foo" to regex match.
if you know the rule to remove columns, for example 2nd column, 5th, 8th ...(+3) it could be easier to be handled, by looping for example.

`l` (letter "l") is a poor choice for a variable name since it's hard to distinguish from `1` (numeral one). Also, some spaces would enhance readability. — Dennis Williamson, Sep 13 '12 at 14:08

score 0 · Answer 3 · answered Sep 13 '12 at 09:22

0

Assuming input file name is input.txt

awk  -F ',' '{print $1 "," $3 "," $5 "," $6 }' input.txt

answered Sep 13 '12 at 09:22

user1203650

300
2
3

the thing is the number of columns are too high(2500),so i need a loop to go through all the columns – FE RY Sep 13 '12 at 09:25

Thor · Answer 4 · 2012-09-13T14:58:36.940

Note that for general CSV file handling, a proper library should be used. If the data are very simple, i.e. no embedded commas, newlines etc. simpler tools can be used.

You have a good awk solution from steve so I'll add an answer based on coreutils and grep:

# find columns to remove
pattern=current
cols=$(head -n1 a.csv | tr ',' '\n' | grep -n "$pattern" | cut -d: -f1 | paste -s -d,)

# remove all columns that matched
cut --complement -d, -f$cols a.csv

Output:

voltage, power, voltage, power
2 , 6 , 12 , 144
3 , 15 , 10 , 100

Note that the --complement option is a GNU cut extension. To generate $cols for other cuts, something like this should do (tested in zsh on FreeBSD):

# number of columns
file=a.csv
pattern=current
n=$(head -n1 "$file" | tr ',' '\n' | wc -l)

# generate complementary list
cols=$(jot $n \
| grep -xvFf <(head -n1 "$file" | tr ',' '\n' | grep -n "$pattern" | cut -d: -f1) \
| paste -s -d, -)

# remove columns
cut -d, -f$cols "$file"

I think the point of the question is that there are many columns to be deleted and their position is not necessarily fixed or on a fixed interval. — Dennis Williamson, Sep 13 '12 at 14:18

how to delete all columns containing a specific headline using AWK

4 Answers4