1

I have a list of genome DNA coordinates (hg38), I want to retrieve corresponding mRNA sequence 200bp up/downstream of these coordinates’ positions, and idea?

Thank you.

I have tried table browser, easy to get all codon sequence based on coordinates, but I don't know where to set the parameter of 200 bp up/down stream of those coordinates.

  • 1
    If you do not get an answer you want quickly on this stack, try posting the question on [Bioinformatics Stack Exchange](https://bioinformatics.stackexchange.com/) or [Biostars](https://www.biostars.org/). – Timur Shtatland Nov 30 '22 at 21:04

1 Answers1

1

Use bedtools getfasta. Change the input coordinates to bed format, and use any scripting tool to increase the interval by 200 nt in both directions. From the docs:

$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG

$ cat test.bed
chr1 5 10

$ bedtools getfasta -fi test.fa -bed test.bed
>chr1:5-10
AAACC

# optionally write to an output file
$ bedtools getfasta -fi test.fa -bed test.bed -fo test.fa.out

$ cat test.fa.out
>chr1:5-10
AAACC

You can install bedtools, for example, using conda:

conda create --channel bioconda --name your_env_name bedtools

REFERENCES:

conda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
conda create: https://docs.conda.io/projects/conda/en/latest/commands/create.html

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
  • 1
    Alternative to writing a script, [bedtools slop](https://bedtools.readthedocs.io/en/latest/content/tools/slop.html) can also be used to expand the intervals. – Cloudberry Dec 02 '22 at 16:18