Remove empty columns by awk

Question

I have a input file, which is tab delimited, but I want to remove all empty columns. Empty columns : $13=$14=$15=$84=$85=$86=$87=$88=$89=$91=$94

INPUT: tsv file with more than 90 columns

a b   d e   g...  
a b   d e   g...

OUTPUT: tsv file without empty columns

a b d e g....
a b d e g...

Thank you

Not clear, could you please be more clear in your question. Do you mean you have multiple columns to nullify? and you don't want to hardcode them? — RavinderSingh13, Jun 14 '18 at 13:04
I have some empty columns in file. And I do not want them in input. So delete empty (blank) columns with awk. — Vonton, Jun 14 '18 at 13:05
it means you need to nullify the columns in Input_file and keep the changes in Input_file itself correct? — RavinderSingh13, Jun 14 '18 at 13:07
Please [edit] you question to show a [mcve] with concise, testable sample input and expected output so we can help you. You do not need input with 90+ columns to demonstrate your issue - 5 or 6 would work just fine. — Ed Morton, Jun 14 '18 at 13:10
I am not sure if I got the question right... would `sed -E 's/\t+/\t/g'` solve it? — Sundeep, Jun 14 '18 at 14:43
@Sundeep no as that would remove fields that just happened to be empty before we tried to delete other specific fields. It also wouldn't remove the leading tab if the field deleted was the first on the line. — Ed Morton, Jun 14 '18 at 16:03
oh ok.. the first column can be treated separately.. OP mentions `want to remove all empty columns`.. — Sundeep, Jun 14 '18 at 16:10
Yeah but you can see in the code segment that he's emptying specific ones himself. idk, some more details would've been nice... — Ed Morton, Jun 14 '18 at 16:34

Ed Morton · Accepted Answer · 2018-06-14T13:24:15.920

6

This might be what you want:

$ printf 'a\tb\tc\td\te\n'
a       b       c       d       e

$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=""} 1'
a               c               e

$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a       c       e

Note that the above doesn't remove all empty columns as some potential solutions might do, it only removes exactly the column numbers you want removed:

$ printf 'a\tb\t\td\te\n'
a       b               d       e

$ printf 'a\tb\t\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a               e

edited Jun 14 '18 at 13:24

answered Jun 14 '18 at 13:18

Ed Morton

188,023
17
78
185

2

Whaaat?! Nice!! – James Brown Jun 14 '18 at 13:23
1

Great Ed..Works perfect..Thank you – Vonton Jun 14 '18 at 13:26
@Ed Morton, apologies for disturbing here, could you please explain `"(^|"FS")"RS` regex, will be grateful to you. – RavinderSingh13 Jun 14 '18 at 13:29
2

I'm replacing each target fields contents with RS so I can then just remove every RS plus the FS before it and leave the rest of the fields + separators untouched. If the target field was the first field, though, then there wouldn't be an FS before it so I accommodate that by removing `^ RS` as well as `FS RS`. – Ed Morton Jun 14 '18 at 13:36

kvantour · Answer 2 · 2018-06-15T09:21:11.913

remove ALL empty columns:

If you have a tab-delimited file, with empty columns and you want to remove all empty columns, it implies that you have multiple consecutive tabs. Hence you could just replace those with a single tab and delete then the first starting tab if you also removed the first column:

sed 's/\t\+/\t/g;s/^\t//' <file>

remove SOME columns: See Ed Morton or just use cut:

cut --complement -f 13,14,15,84,85,86,87,88,89,91,94 <file>

remove selected columns if and only if they are empty:

Basically a simple adaptation from Ed Morton :

awk 'BEGIN{FS=OFS="\t"; n=split(col,a,",")}
     { for(i=1;i<=n;++i) if ($a[i]=="") $a[i]=RS; gsub("(^|"FS")"RS,"") }
     1' col=13,14,15,84,85,86,87,88,89,91,94 <file>

Remove empty columns by awk

2 Answers2