2

I have a list.txt which contains the following lines.

Primer_Adapter_clean_KL01_BOLD1_100_KL01_BOLD1_100_N701_S507_L001_merged.fasta
Primer_Adapt_clean_KL01_BOLD1_500_KL01_BOLD1_500_N704_S507_L001_merged.fasta
Primer_Adapt_clean_LD03_BOLD2_Sessile_LD03_BOLD2_Sessile_N710_S506_L001_merged.fasta

Now I would like to grab only the substring between the 4th underscore and 7th underscore such that it will appear as below

BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03

I tried the below awk command but I guess I've got it wrong. Any help here would be appreciated. If this can be achieved via sed, I would be interested in that solution too.

awk -v FPAT="[^__]*" '$4=$7' list.txt
Cyrus
  • 84,225
  • 14
  • 89
  • 153

2 Answers2

2

I feel like awk is overkill for this. You can just use cut to select just the fields you want:

$ cut -d_ -f5-7 list.txt
BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
Shawn
  • 47,241
  • 3
  • 26
  • 60
0
awk 'BEGIN{FS=OFS="_"} {print $5,$6,$7}' file

Output:

BOLD1_100_KL01
BOLD1_500_KL01
BOLD2_Sessile_LD03
Cyrus
  • 84,225
  • 14
  • 89
  • 153