-1

I have a virus gene-integrated human gene in data frame or text file form like:

"C""G""C""T""G""T""T""G""T""T"...

It is 50000 nucleotides long. I have also the virus gene data frame and I found its standard deviation and mean frequency before. I'm trying to find an approximate location of this virus gene by dividing the human gene into 1000 nucleotides long fragments and find the location by frequency and standard deviation values that I have.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Amy
  • 11
  • 1
    Any particular reason you aren't just aligning the two sequences? What's your end goal here? – Jared Andrews Jun 01 '18 at 13:02
  • My goal is just to find the location of the virus gene on the human genome. – Amy Jun 01 '18 at 13:06
  • So in the entire genome, not just a given gene? Do you have whole genome sequencing for your samples of interest then? – Jared Andrews Jun 01 '18 at 13:07
  • I mean it is a gene including 50.000 nucleotides – Amy Jun 01 '18 at 13:09
  • The viral gene, yes. But are you looking for it in the entirety of the human genome or a single gene? Your question is unclear: "I'm trying to find an approximate location of this virus gene by dividing the **human gene** into 1000 nucleotides long fragments" – Jared Andrews Jun 01 '18 at 13:10
  • I'm sorry, I don't have a brilliant English. It is not the entire, single gene. – Amy Jun 01 '18 at 13:14
  • That's okay, just trying to get the info needed. If you're trying to compare two sequences, you can just take the first 1000 bases or so of the viral gene and align them to the human gene in a program like [this](https://www.ebi.ac.uk/Tools/psa/emboss_water/nucleotide.html). – Jared Andrews Jun 01 '18 at 13:20
  • Also, in the future, you can post bioinformatics questions in the [Bioinformatics Stack Exchange](https://bioinformatics.stackexchange.com/) and you will likely get quicker/better answers. – Jared Andrews Jun 01 '18 at 13:21
  • Thank you very much. But I need to find it with codes in R studio. – Amy Jun 01 '18 at 13:23
  • Like Jared said, I think it's best to ask it on bioinformatics stack exchange to get a theoretical approach to solve this problem. Once you got the approach, you can start writing code and make an attempt to solve the problem, we might be able to help you then. As it stands right now, there is little to no chance we can assist you with the information you provided. – Shique Jun 01 '18 at 13:38

1 Answers1

1

You can still apply the same pairwise alignment method within R, and it'll be a lot easier/more accurate than trying to do it yourself by frequency, etc. It still uses some of those same principles. This page will give you many detailed examples of how to do it in R. About halfway down the page are the examples that directly relate to your question.

Jared Andrews
  • 272
  • 2
  • 12