cat file.txt
MNS GYPA*N
MNS GYPA*M c.59T>C;c.71A>G;c.72G>T
MNS GYPA*Mc c.71G>A;c.72T>G
MNS GYPA*Vw c.140C>T
MNS GYPA*Mg c.68C>A
MNS GYPA*Vr c.197C>A
MNS GYPB*Mta c.230C>T
MNS GYPB*Ria c.226G>A
MNS GYPB*Nya c.138T>A
MNS GYPA*Hut c.140C>A
.
.
.
the second column values could start with GYPA,GYPB,GYPC,GYPD, ... GYPZ. I would like to set a position count for each GYP* and split the third column as follows:
1 MNS GYPA*N
2 MNS GYPA*M c.59T>C
2 MNS GYPA*M c.71A>G
2 MNS GYPA*M c.72G>T
3 MNS GYPA*Mc c.71G>A
3 MNS GYPA*Mc c.72T>G
4 MNS GYPA*Vw .140C>T
5 MNS GYPA*Mg c.68C>A
6 MNS GYPA*Vr c.197C>A
1 MNS GYPB*Mta c.230C>T
2 MNS GYPB*Ria c.226G>A
3 MNS GYPB*Nya c.138T>A
4 MNS GYPB*Hut c.140C>A
.
.
.
cat format.awk
BEGIN {FS=OFS="\t"}
$2 ~ /GYPA/
{ num=split($3,arr,/;/);
for (i=1;i<=num;i++)
{ print NR,$1,$2,arr[i]}}
$2 ~ /GYPB/
{ num=split($3,arr,/;/);
for (i=1;i<=num;i++)
{ print NR,$1,$2,arr[i]} }
...
I am not sure how to reset NR when it reaches the the next ~ GYP. The GYP{A..Z} are in order from A to Z.