-1

Can someone help me with this python code? When I run it, nothing happens. No errors or anything weird to me. It reads in and opens the file just fine. I have a set of protein sequence in Fasta format and I have to find motifs of my sequence like "RRTxSKxxxxAxxRxG" I have to find a sequence where x is written

this is my python code

import re
    userinput = input("Please provide a FASTA file.")
    while userinput:
    try:
        if userinput == "0":
            break
        with open(userinput, mode = 'r') as protein:
            readprotein = protein.read()
        matches = re.findall('RTxSKxxxxAxxRxG', readprotein)
        for match in matches:
            print(match)
        break
    except FileNotFoundError:
        print("File not found. enter the fasta file.")
        userinput = input("Please provide a FASTA file. 0 to quit.")
  • https://biopython.org/wiki/MotifDev reads Enhancements currently underway: Expanding the Bio.Motif tutorial on analysis of protein motifs (Dave Bridges is workin on this, see http://github.com/davebridges/biopython-biomotif-supplement/tree/master his branch on github writing a simplistic, pure-python, de-novo motif finder writing a wrapper for RSAT tools (http://rsat.ulb.ac.be/rsat/) using either local binaries or SOAP , didn't find anything about motif and protein tried motif.create() but got error when it encounters an aminoacid single letter code insted of dna base one – pippo1980 Jun 12 '21 at 14:13

1 Answers1

0

with my input as fasta.fasta:

>PRIMO ['RTXSKXXXXAXXRXG']
>PRIMO2 ['RTGSKXXXXAGGRXG']
>TERZO []
>QUARTO ['RTGSKLLLLAGGRSG', 'RTGSKWFGRAGGRXG', 'RTGSKPPPPAGGRXG']
['RTXSKXXXXAXXRXG']
['RTGSKXXXXAGGRXG']
[]
['RTGSKLLLLAGGRSG', 'RTGSKWFGRAGGRXG', 'RTGSKPPPPAGGRXG']

modified your code to:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 12 14:48:00 2021

@author: Pietro


https://stackoverflow.com/questions/67948483/find-motif-from-in-between-fasta-file-from-python


"""

import re


# userinput = input("Please provide a FASTA file.")

userinput = 'fasta.fasta'


pattern = re.compile(r"(RT[A-Z]SK[A-Z]{4}A[A-Z]{2}R[A-Z]G)")

matchz = []
while userinput:
    try:
        if userinput == "0":
            break
        with open(userinput, mode = 'r') as protein:
            for line in protein:  #memory efficient way
            #readprotein = protein.readlines()
            #for line in readprotein:
                # print(line)
                line = line.upper().strip("\n")
                if line.startswith('>'):
                    name=line
                else:
                    matches = re.findall(pattern, line)
                    print(name,matches)
                    matchz.append(matches)
        for match in matchz:
            print(match)
        break
    except FileNotFoundError:
        print("File not found. enter the fasta file.")
        userinput = input("Please provide a FASTA file. 0 to quit.")

output is:

>PRIMO ['RTXSKXXXXAXXRXG']
>PRIMO2 ['RTGSKXXXXAGGRXG']
>TERZO []
>QUARTO ['RTGSKLLLLAGGRSG', 'RTGSKWFGRAGGRXG', 'RTGSKPPPPAGGRXG']
['RTXSKXXXXAXXRXG']
['RTGSKXXXXAGGRXG']
[]
['RTGSKLLLLAGGRSG', 'RTGSKWFGRAGGRXG', 'RTGSKPPPPAGGRXG']
pippo1980
  • 2,181
  • 3
  • 14
  • 30