python script : sequence identifier and number of possible sequences

Question

I need to work with python for a school project, but I really don't know how to start at it.

The question is: A FASTA file contains a number of DNA sequences. Unfortunately, some of the symbols are ambiguous. The encoding is IUPAC (http://www.bioinformatics.org/sms/iupac.html). Write a Python script that, given the name of the FASTA file, writes the sequence identifier and the number of possible sequences for each sequence in the file. Example: for the—very short—sequence “AYGH” the number of possible sequences would be 6.

We are not here to do your work Sophie, try it, post the code and the errors youa re getting and we will debug it so that you learn something. — Adirio, Dec 20 '16 at 16:15
Check this: http://biopython.org/wiki/Seq and this:https://github.com/jordancheah/DNA-FASTA-Python — Mohammad Yusuf, Dec 20 '16 at 16:47
What you are asking is already implemented here I guess:https://github.com/mbourgey/Concordia_Workshop_Biopython. Read the source code and implement it yourself. — Mohammad Yusuf, Dec 20 '16 at 16:50
I know, but I don't know how to start ... So I need some help for that ... — Sophie, Dec 20 '16 at 17:10
I don't know how I can give in the IUPAC code so I can use it? — Sophie, Dec 20 '16 at 17:11
I tried this: def IUPAC_code: "R" = "A" or "G" "Y = "C" or "T" "S" = "G" or "C" "W" = "A" or "T" "K" = "A" or "T" "M" = "A" or "C" "B" = "C" or "G" or "T" "D" = "A" or "G" or "T" "H" = "A" or "C" or "T" "V" = "A" or "C" or "G" "N" = "A" or "C" or "G" or "T" — Sophie, Dec 20 '16 at 17:52

score 0 · Answer 1 · answered Jan 13 '17 at 23:17

Try with a dictionnary like this :

nucleotides = {'A':['A'], 'C':['C'], 'G':['G'], 'T':['T'], 'U':['U'], 'R':['A','G'], 'Y':['C','T'], 'S':['G','C'], 'W':['A','T'], 'K':['G','T'], 'M':['A','C'], 'B':['C','G','T'], 'D':['A','G','T'], 'H':['A','C','T'], 'V':['A','C','G'], 'N':['A','C','G','T'], '-':['-'], '.':['-']}

Then loop on each possibilities oh each nucleotide of your main sequence.

python script : sequence identifier and number of possible sequences

1 Answers1