-1

I am not so good with Unix commands and struggling to achieve this.

I have a file like below

INPUT

ABCDEF_XY_12345_PQRTS_67367
1,a,b,c1
2,a,b,c2
3,a,b,c3
.....
APRTEYW_XY_23456_GDJHJH_232434
1,a,b,c4
2,a,b,c5
3,a,b,c6
......
GDHGJHG_XY_35237_FHDJFH_738278
1,a,b,c7
2,a,b,c8
3,a,b,c9
......

OUTPUT

12345,1,a,b,c1
12345,2,a,b,c2
12345,3,a,b,c3
23456,1,a,b,c4
23456,2,a,b,c5
23456,3,a,b,c6
35237,1,a,b,c7
35237,2,a,b,c8
35237,3,a,b,c9

Essentially, taking substring between _XY_[<STRING>]_ and prepending them to following lines like <STRING>,1,a,b,c1 until we encounter a string matching pattern _XY_[<STRING>]_ and then repeat the same process till EOF.

I am trying to find an easy way to do it either using awk or splitting the master file to multiple smaller files. Can you pls in the correct direction?

user1637487
  • 241
  • 1
  • 9
  • 17
  • 1
    It is always recommended to add your efforts in your post. Also by seeing your profile came to know that you hardly select any answer as correct one. So give it sometime and when you see few answers to your question, try to select anyone of them as correct one out of them. – RavinderSingh13 Mar 14 '19 at 00:09
  • 1
    Please check [someone-answers](http://stackoverflow.com/help/someone-answers) and [accepted](https://stackoverflow.com/help/accepted-answer). You can give a kind of closure to your questions, current one, or those old ones, all of them, and it benefits yourself too. – Til Mar 14 '19 at 02:08
  • If those `......` lines don't actually exist in your real input then get rid of them from your sample input as all they do is obfuscate your example. – Ed Morton Mar 14 '19 at 14:53

2 Answers2

2

Try awk with multiple delimiter

awk -F"[_,]" -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' file

Thanks @EdMorton, single delimiter is enough

awk -F_ -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' file

it can be further shortened as

awk -F_ -v OFS=, ' /_/ {k=$3;next} { print k,$0 } ' file

with your given inputs

$ cat filex.txt
ABCDEF_XY_12345_PQRTS_67367
1,a,b,c1
2,a,b,c2
3,a,b,c3
APRTEYW_XY_23456_GDJHJH_232434
1,a,b,c4
2,a,b,c5
3,a,b,c6
GDHGJHG_XY_35237_FHDJFH_738278
1,a,b,c7
2,a,b,c8
3,a,b,c9

$ awk -F_ -v OFS=, ' { if(/_/) { k=$3 } else { print k,$0 } } ' filex.txt
12345,1,a,b,c1
12345,2,a,b,c2
12345,3,a,b,c3
23456,1,a,b,c4
23456,2,a,b,c5
23456,3,a,b,c6
35237,1,a,b,c7
35237,2,a,b,c8
35237,3,a,b,c9

$
stack0114106
  • 8,534
  • 3
  • 13
  • 38
1

1st solution: Could you please try following once.

awk 'BEGIN{FS="_";OFS=","}/^[a-zA-Z]+/{val=$3;next} !/^\..*\.$/{print val,$0}' Input_file

2nd solution: In case place of XY string is NOT fixed in line then try following.

awk '
BEGIN{
  FS="_"
  OFS=","
}
/^[a-zA-Z]+/ && match($0,/XY_[0-9]+_/){
  val=substr($0,RSTART+3,RLENGTH-4)
  next
}
!/^\..*\.$/{
  print val,$0
}
'   Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93