-2

I have a text file with thousands of lines, I want to update those lines by making a few changes.

original lines:

b1522   ftp://ftp.genecard.giv.nlm.org/genome/all/ABC_001596115.1_ASM159611v1#
dd1120  ftp://ftp.genecard.giv.nlm.org/genome/all/ABC_231146189.1_ASM159611v1#

desired output:

b1522   ftp://ftp.genecard.giv.nlm.org/genome/all/ABC/001/596/115/ABC_001596115.1_ASM159611v1#
dd1120  ftp://ftp.genecard.giv.nlm.org/genomes/all/ABC/231/146/189/ABC_231146189.1_ASM159611v1#

I want to copy from "ABC" to the last number before the ".", paste them within two forward slashes after "all", delete underscore and place forward slash after every three characters. I have no idea how to go about it with awk. my awk knowledge is quite basic

Ifeanyi
  • 77
  • 1
  • 8

2 Answers2

1

I don't know how to do it in awk but you can do it easily with 'sed'

  sed -r -e 's%/(ABC_)((...)(...)(...))%/ABC/\3/\4/\5/\1\2%' < infile.txt > out file.txt

What this does is

match each line containing ABC_

(ABC_) Capture the ABC_ into a variable called \1

((...)(...)(...)). Capture the next 9 characters into a variable called \2

(...) Capture three characters and put them into variables. These three occurrences will each create variables called \3,\4, and \5

s%pattern%replacement% matches the pattern and replaces all of it with the replacement.

In this case we match the ABC_ and the 9 characters, store them in variables, and then replace the whole lot with:

/ABC/\3/\4/\5/\1\2

Where /ABC/\3/\4/\5/

Is what you are (adding) inserting

And \1\2

Is putting the original text back, on the right of the insertion

Chunko
  • 352
  • 1
  • 8
1

another similar sed

sed -i.bak -r 's~((ABC)_(...)(...)(...))~\2/\3/\4/\5/\1~' file
karakfa
  • 66,216
  • 7
  • 41
  • 56