Example of string in question called gene_snps:
"ultra_rare_variant_chr9:23143143_A/C_chr9:5322432_G/T_chr9:9840984342_T/C;chr9:5324234:G/T;chr9:324424_T/A"
Desired outcome:
markerID
chr9:23143143_A/C
chr9:5322432_G/T
chr9:9840984342_T/C
chr9:5324234:G/T
chr9:324424_T/A
With the ultimate outcome being a table:
CHR POS REF ALT
chr9 23143143 A C
chr9 5322432 G T
chr9 9840984342 T C
chr9 5324234 G T
chr9 324424 T A
I used to have a code to split these up when they were only ";" separated using:
x <- separate_rows(gene_snps, markerIDs, sep=c(";"))
x <- separate_rows(x, col="markerIDs", into=c("pos", "ref_alt"), sep=c("_"))
x <- separate_rows(x, col="pos", into=c("CHR", "POS"), sep=c(":"))
x <- separate_rows(x, col="ref_alt", into=c("REF", "ALT"), sep=c("/"))
but that is now out as the upstream tool used to generate the code now introduces the "ultra_rare" tag which is all "_" separated.
Any help with splitting this string up to get rid of the ultra_rare_variant bit and split each chrx:x_x/x chunk into it's own row would be much appreciated!
All the best