0

I have a list of chromosome numbers and positions which I have obtained from the output of MAC of my sample. The format looks like this:

chr1.12661672.G.A,chr1.12661721.C.T 1/6 11  2   Mutant  
chr1.157640161.C.T,chr1.157640277.G.A   1/6 11  2   Mutant 
chr1.180806049.T.A,chr1.180806061.G.C   1/6 11  11  Mutant
chr1.205929901.G.A,chr1.205930053.C.T   1/7 10  5   Mutant
...

For example for the first row I have G on position 12661672 and C on position 12661721 but I want to find out what bases are between 12661672 and 12661721. Are there any tools for bases for custom positions? I am using the GRCh38 reference genome.

The best I could think of was looking through Ensembl and finding the bases manually. Obviously this is incredibly time consuming so I would something more automated.

I am looking to build a VCF of the MNVs, so something along the lines of:

chr1 180806049 TAGTGAACAAGG AAGTGAACAAGC . PASS

Where all the positions between the identified bases are filled in.

Kay G
  • 13
  • 2
  • What is the bases? I.e. what is your desired result? – 3dSpatialUser Feb 10 '23 at 10:49
  • Ah sorry I forgot to put that in. I am looking to build a VCF for MNVs, so something along the lines of: chr1 1291551 . GTCATTTTCGACTACGCATCAGCGTACTC ATCATTTTCGACTACGTACTT . PASS Where all the positions between the identified bases are filled in. – Kay G Feb 10 '23 at 10:56
  • You should be able to edit your original to add the new information to it. When you do so, it will make it easier for folks to follow if you use an output example which matches your given input. – Tom Morris Feb 10 '23 at 18:42
  • 1
    Welcome to [Stack Overflow.](https://stackoverflow.com/ "Stack Overflow") This is not a code-writing or tutoring service. It is not possible to provide a specific answer without you providing sufficient information to understand your problem. Please see: [Why is Can someone help me? not an actual question?](https://meta.stackoverflow.com/questions/284236/why-is-can-someone-help-me-not-an-actual-question) for more details. – itprorh66 Feb 11 '23 at 16:28

1 Answers1

1

Extract chromosome, start and end into a bed format file. If this is for one analysis only I would use excel otherwise I would use python. Anyway the result should luke like this

chr1 12661672 12661721  
chr1 157640161 157640277 
chr1 180806049 180806061
chr1 205929901 205930053

Remember that a bed file is a tab-separated-values format

Then, one alternative could be bedtools getfasta

$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 5 10

$ bedtools getfasta -fi test.fa -bed test.bed
>chr1:5-10
AAACC

# optionally write to an output file
$ bedtools getfasta -fi test.fa -bed test.bed -fo test.fa.out

$ cat test.fa.out
>chr1:5-10
AAACC

Copied from here https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html