2

I have a dataset of SNPs, which aren't coded the way I need them. Instead of being coded just "rsNUMBER" they also have the information of the chip-analyses. For example: GSA-rsNUMBER or psy-rsNUMBER

Some also have the information of the chip-analyses at the end rsNUMBER_CNV_SULT1A3 .

Is there a way to remove the chip-information? My data is in plink binary format .bed, .bim, and .fam.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Inken
  • 21
  • 1

1 Answers1

2

You can use Perl to get a simple hack working:

echo -e "1 rs123-bob 0 123456 N N\n1 bob-rs123 0 123456 N N\n" | perl -p -e "s/(\S+\s+)\S*(rs[0-9]+)\S*(.*)/\1\2\3/g;

Above assumes .bim format.

Vince
  • 3,325
  • 2
  • 23
  • 41