Remove characters with pattern from a tab-delimited file

Question

I have saveral files with pattern such as


NODE_1_length_59711_cov_84.026979_g0_i0_1	12.8
NODE_1_length_59711_cov_84.026979_g0_i0_2	18.9
NODE_2_length_59711_cov_84.026979_g0_i0_1	14.3
NODE_2_length_59711_cov_84.026979_g0_i0_2	16.1
NODE_165433_length_59711_cov_84.026979_g0_i0_1	29

I want to remove all characters from starting '1' to last '_'. so that I can get an output like this from multiple files-


1_1	12.8
1_2	18.9
2_1	14.3
2_2	16.1
165433_1	29

Welcome to SO, please do add your efforts in form of code in your question, which is highly encouraged on SO, thank you. — RavinderSingh13, Mar 24 '21 at 08:48
`echo 'NODE_165433_length_59711_cov_84.026979_g0_i0_1' | sed -E 's/^NODE_([0-9]+)_.*_([0-9]+)/\1_\2/'` — Vishal Singh, Mar 24 '21 at 09:46

score 2 · Answer 1 · answered Mar 24 '21 at 09:50

2

see demo

echo 'NODE_165433_length_59711_cov_84.026979_g0_i0_1' | sed -E 's/^NODE_([0-9]+)_.*_([0-9]+)/\1_\2/'

Output:

165433_1

answered Mar 24 '21 at 09:50

Vishal Singh

6,014
2
17
33

score 1 · Accepted Answer · answered Mar 24 '21 at 09:37

Using GNU awk:

awk -F "\t" '{ fld1=gensub(/(^NODE_)([[:digit:]]+)(.*)([[:digit:]]+$)/,"\\2_\\4","g",$1);OFS=IFS;print fld1"\t"$2}' file

Explanation:

awk -F "\t" '{                                                       # Set the field separator to tab
               fld1=gensub(/(^NODE_)([[:digit:]]+)(.*)([[:digit:]]+$)/,"\\2_\\4","g",$1);                                      # Split the first field into 4 sections represented in parenthesis and then substitute the line for the the second section, a "_" and then the fourth section. Read the result into a variable fld1
               print fld1"\t"$2                                      # Print fld1, followed by a tab and then the second field.
             }' file

Remove characters with pattern from a tab-delimited file

2 Answers2

see demo