2

I have a input file, which is tab delimited, but I want to remove all empty columns. Empty columns : $13=$14=$15=$84=$85=$86=$87=$88=$89=$91=$94

INPUT: tsv file with more than 90 columns

a b   d e   g...  
a b   d e   g...

OUTPUT: tsv file without empty columns

a b d e g....
a b d e g...

Thank you

Vonton
  • 2,872
  • 4
  • 20
  • 27
  • Not clear, could you please be more clear in your question. Do you mean you have multiple columns to nullify? and you don't want to hardcode them? – RavinderSingh13 Jun 14 '18 at 13:04
  • I have some empty columns in file. And I do not want them in input. So delete empty (blank) columns with awk. – Vonton Jun 14 '18 at 13:05
  • it means you need to nullify the columns in Input_file and keep the changes in Input_file itself correct? – RavinderSingh13 Jun 14 '18 at 13:07
  • 2
    Please [edit] you question to show a [mcve] with concise, testable sample input and expected output so we can help you. You do not need input with 90+ columns to demonstrate your issue - 5 or 6 would work just fine. – Ed Morton Jun 14 '18 at 13:10
  • I am not sure if I got the question right... would `sed -E 's/\t+/\t/g'` solve it? – Sundeep Jun 14 '18 at 14:43
  • 1
    @Sundeep no as that would remove fields that just happened to be empty before we tried to delete other specific fields. It also wouldn't remove the leading tab if the field deleted was the first on the line. – Ed Morton Jun 14 '18 at 16:03
  • oh ok.. the first column can be treated separately.. OP mentions `want to remove all empty columns`.. – Sundeep Jun 14 '18 at 16:10
  • Yeah but you can see in the code segment that he's emptying specific ones himself. idk, some more details would've been nice... – Ed Morton Jun 14 '18 at 16:34

2 Answers2

6

This might be what you want:

$ printf 'a\tb\tc\td\te\n'
a       b       c       d       e

$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=""} 1'
a               c               e

$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a       c       e

Note that the above doesn't remove all empty columns as some potential solutions might do, it only removes exactly the column numbers you want removed:

$ printf 'a\tb\t\td\te\n'
a       b               d       e

$ printf 'a\tb\t\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a               e
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 2
    Whaaat?! Nice!! – James Brown Jun 14 '18 at 13:23
  • 1
    Great Ed..Works perfect..Thank you – Vonton Jun 14 '18 at 13:26
  • @Ed Morton, apologies for disturbing here, could you please explain `"(^|"FS")"RS` regex, will be grateful to you. – RavinderSingh13 Jun 14 '18 at 13:29
  • 2
    I'm replacing each target fields contents with RS so I can then just remove every RS plus the FS before it and leave the rest of the fields + separators untouched. If the target field was the first field, though, then there wouldn't be an FS before it so I accommodate that by removing `^ RS` as well as `FS RS`. – Ed Morton Jun 14 '18 at 13:36
3

remove ALL empty columns:

If you have a tab-delimited file, with empty columns and you want to remove all empty columns, it implies that you have multiple consecutive tabs. Hence you could just replace those with a single tab and delete then the first starting tab if you also removed the first column:

sed 's/\t\+/\t/g;s/^\t//' <file>

remove SOME columns: See Ed Morton or just use cut:

cut --complement -f 13,14,15,84,85,86,87,88,89,91,94 <file>

remove selected columns if and only if they are empty:

Basically a simple adaptation from Ed Morton :

awk 'BEGIN{FS=OFS="\t"; n=split(col,a,",")}
     { for(i=1;i<=n;++i) if ($a[i]=="") $a[i]=RS; gsub("(^|"FS")"RS,"") }
     1' col=13,14,15,84,85,86,87,88,89,91,94 <file>
kvantour
  • 25,269
  • 4
  • 47
  • 72