-4

I need to work with python for a school project, but I really don't know how to start at it.

The question is: A FASTA file contains a number of DNA sequences. Unfortunately, some of the symbols are ambiguous. The encoding is IUPAC (http://www.bioinformatics.org/sms/iupac.html). Write a Python script that, given the name of the FASTA file, writes the sequence identifier and the number of possible sequences for each sequence in the file. Example: for the—very short—sequence “AYGH” the number of possible sequences would be 6.

  • We are not here to do your work Sophie, try it, post the code and the errors youa re getting and we will debug it so that you learn something. – Adirio Dec 20 '16 at 16:15
  • Check this: http://biopython.org/wiki/Seq and this:https://github.com/jordancheah/DNA-FASTA-Python – Mohammad Yusuf Dec 20 '16 at 16:47
  • What you are asking is already implemented here I guess:https://github.com/mbourgey/Concordia_Workshop_Biopython. Read the source code and implement it yourself. – Mohammad Yusuf Dec 20 '16 at 16:50
  • I know, but I don't know how to start ... So I need some help for that ... – Sophie Dec 20 '16 at 17:10
  • I don't know how I can give in the IUPAC code so I can use it? – Sophie Dec 20 '16 at 17:11
  • I tried this: def IUPAC_code: "R" = "A" or "G" "Y = "C" or "T" "S" = "G" or "C" "W" = "A" or "T" "K" = "A" or "T" "M" = "A" or "C" "B" = "C" or "G" or "T" "D" = "A" or "G" or "T" "H" = "A" or "C" or "T" "V" = "A" or "C" or "G" "N" = "A" or "C" or "G" or "T" – Sophie Dec 20 '16 at 17:52

1 Answers1

0

Try with a dictionnary like this :

nucleotides = {'A':['A'], 'C':['C'], 'G':['G'], 'T':['T'], 'U':['U'], 'R':['A','G'], 'Y':['C','T'], 'S':['G','C'], 'W':['A','T'], 'K':['G','T'], 'M':['A','C'], 'B':['C','G','T'], 'D':['A','G','T'], 'H':['A','C','T'], 'V':['A','C','G'], 'N':['A','C','G','T'], '-':['-'], '.':['-']}

Then loop on each possibilities oh each nucleotide of your main sequence.

Biopy
  • 167
  • 1
  • 5
  • 15