Questions tagged [genbank]

GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word "LOCUS".

GenBank format (GenBank Flat File Format) consists of an annotation section and a sequence section. The start of the annotation section is marked by a line beginning with the word LOCUS.

53 questions
0
votes
1 answer

how to get a sequence after a word with whitespace

For school I have to parse a string after a word with a lot of whitespace, but I just can't get it. Because the file is a genbank. So for example: BLA …
0
votes
1 answer

How can I extract a feature from a genbank file by label?

I'm trying to parse a genbank file to find a specific feature. I can pull it out if I know the feature type (e.g. repeat_region) - eg if I'm looking for this feature: repeat_region 5623..5756 /label=5' ITR …
Wolfgang
  • 80
  • 5
0
votes
1 answer

How to remove an invalid sequence from a Genbank file containing multiple genome sequences based on ID

I have a ~3 GB Genbank file containing complete Genbank annotations for ~20,000 bacterial genome sequences. My goal is to use BioPython to parse these sequences, and write individual fasta files for non-duplicate sequences with something like the…
DanStu
  • 174
  • 9
0
votes
0 answers

How to retrieve the geographic origin of the samples referenced/deposited in genBank using R?

A simple question: how to retrieve the geographic origin of samples referenced/deposited in genBank using R? Best and thanks! Pedro
perep1972
  • 147
  • 1
  • 9
0
votes
1 answer

How to elegantly pass variables to command

I have a set of commands that currently work with one file: sed -n -e '/ABC/,/LOCUS/ p' mainfile.gbk | sed -e '$ d' >temp1 sed '/ source/,/ gene/{/ gene/!d}' temp1 >temp2 grep -v " gene" temp2 >temp3 grep -v " …
rororo
  • 815
  • 16
  • 31
0
votes
1 answer

extract metadata from genomic gbff files

I have >1000 .gbff.gz genomic files and I want to extract metadata from each and have metadata entries in separate columns.
user2861089
  • 1,205
  • 4
  • 22
  • 44
0
votes
0 answers

Filter genbank files by the order of the species in R

Im making a local data base of genbank files of 456 sequences of FOXP2 genes of mammals, this sequences came off form a blastn. Im using the "ape" R package in order to explore the genbank files, I used the read.genbank function with the 456 IDs and…
0
votes
1 answer

Get Accession Numbers from NCBI from corresponding GI Numbers in fasta headers in python

I keep seeing warnings on Genbank that they are phasing out GI numbers and have a number of fasta files saved where I've edited the headers in the following format: >SomeText_ginumber I've no idea where to even begin with this but is there a way,…
wl284
  • 53
  • 9
0
votes
1 answer

Entrez and SeqIO "no records found in handle"

My code looks like this: import re from Bio import SeqIO from Bio import Entrez Entrez.email = "...@..." # My e-mail address handle1 = Entrez.efetch(db="pubmed", id=pmid_list_2010, rettype="gb", retmode="text") data1 =…
jarch
  • 59
  • 1
  • 7
0
votes
1 answer

Awk double-slash record separator

I am trying to separate RECORDS of a file based on the string, "//". What I've tried is: awk -v RS="//" '{ print "******************************************\n\n"$0 }' myFile.gb Where the "******" etc, is just a trace to show me that the record is…
libby
  • 585
  • 4
  • 15
0
votes
2 answers

Python: Regular Expressions on getting repeating set of numbers

I'm working with a file, that is a Genbank entry (similar to this) My goal is to extract the numbers in the CDS line, e.g.: CDS join(1200..1401,3490..4302) but my regex should also be able to extract the numbers from multiple…
Gliz
  • 73
  • 1
  • 2
  • 13
0
votes
1 answer

Save output from biopython object into a file?

Here i have a code written to extract "locus_tag" of gene using "id". How can i save the output from this into a file in a tab seperated format????code adopted and modified https://www.biostars.org/p/110284/ from Bio import SeqIO foo =…
Tony
  • 85
  • 7
0
votes
1 answer

Parsing GenBank to FASTA with yield in Python (x, y)

For now I have tried to define and document my own function to do it, but I am encountering issues with testing the code and I have actually no idea if it is correct. I found some solutions with BioPython, re or other, but I really want to make this…
0
votes
1 answer

BioPerl: extract CDS error

I am trying to extract CDS and corresponding amino acid sequences from GenBank file using BioPerl. The script is shown below: while (my $seq_object = $in->next_seq){ for my $feat_object ($seq_object->get_SeqFeatures) { if ($feat_object…
RonnB
  • 57
  • 6
0
votes
2 answers

Change the character length of single line strings using regex

I have extracted a sequence from a genbank file that consists of single lines of strings with 60 bases (with a \n at the end). How to modify the sequence using perl so that it prints 120 bases for each line using regex and not bioperl. original…
zebra
  • 83
  • 1
  • 1
  • 5