0

My goal was to extract the sequence of "Antimicrobial Peptides" from the NCBI Database using keywords such as "Antimicrobial Peptides, AMPs" and also specifying its length.

I have written a code to extract the sequences of Antimicrobial Peptides (AMPs) from the NCBI protein database using the Biopython library.

The Code

from Bio import Entrez
from Bio import SeqIO

def scrape_fasta_sequence(keywords):
    # Provide your email address to NCBI
    Entrez.email = 'mabdullahafzal02@gmail.com'

    # Create the query string with filters
    query = f'{keywords} AND srcdb_refseq[PROP] AND 7:50[SLEN] '

    # Search for protein IDs that match the query
    handle = Entrez.esearch(db='protein', term=query)
    record = Entrez.read(handle)
    id_list = record['IdList']

    # Use the protein IDs to fetch the sequences
    fasta_sequences = []
    for protein_id in id_list:
        handle = Entrez.efetch(db='protein', id=protein_id, rettype='fasta', retmode='text')
        fasta_sequence = SeqIO.read(handle, 'fasta')
        fasta_sequences.append(fasta_sequence)
        handle.close()

    return fasta_sequences

# Example usage
keywords = 'Antimicrobial Peptides OR AMPs'
fasta_sequences = scrape_fasta_sequence(keywords)
for fasta_sequence in fasta_sequences:
    print(fasta_sequence)

Output

Output was not specific to the Antimicrobial Peptides. It provided me with other proteins which were not needed. output snippet

Kindly help, if code can be modified to add precision only towards antimicrobial peptides.

polarise
  • 2,303
  • 1
  • 19
  • 28

0 Answers0