sed to copy part of line to end

Question

I'm trying to copy part of a line to append to the end:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz

becomes:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz

I have tried:

sed 's/\(.*(GCA_\)\(.*\))/\1\2\2)'

A more simplified question would be "How to change `ftp://one/two/three_four/five` to `ftp://one/two/three_four/three/five` — George Vasiliou, Sep 12 '17 at 09:10
I think it would be better if OP explains how the new version is arrived at... could be as simple as `xyz.5_foo.bar.baz` to `xyz.5/xyz_foo.bar.baz` — Sundeep, Sep 12 '17 at 09:17

George Vasiliou · Answer 1 · 2017-09-12T09:14:04.517

$ f1=$'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz'

$ echo "$f1"
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz

$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\1\2\3\/\2\4/' <<<"$f1"
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz

sed -E (or -r in some systems) enables extended regex support in sed , so you don't need to escape the group parenthesis ( ).

The format (GCA_.[^.]*) equals to "get from GCA_ all chars up and excluding the first found dot" :

$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\2/' <<<"$f1"
GCA_900169985

Similarly (.[^_]*) means get all chars up to first found _ (excluding _ char). This is the regex way to perform a non greedy/lazy capture (in perl regex this would have been written something like as .*_?)

$ sed -E 's/(.*)(GCA_.[^.]*)(.[^_]*)(.*)/\3/' <<<"$f1"
.1

score 0 · Answer 2 · answered Sep 12 '17 at 09:18

Short sed approach:

s="ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1_IonXpress_024_genomic.fna.gz"
sed -E 's/(GCA_[^._]+)\.([^_]+)/\1.\2\/\1/' <<< "$s"

The output:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/169/985/GCA_900169985.1/GCA_900169985_IonXpress_024_genomic.fna.gz

sed to copy part of line to end

2 Answers2