Questions tagged [ncbi]

NCBI is a National Center for Biotechnology Information, one of the most important websites used by bioinformaticians. NCBI runs a big variety of various bioinformatical web services, also provides important databases for download.

The NCBI covers a wide range of bioinformatics resources, from journal listing to gene alignments to chemical libraries databases to protein folding prediction.

NCBI's data is publicly available from the main website and from ftp repositories.

  • PubMed
    PubMed, a database of citations and abstracts for biomedical literature from MEDLINE and additional life science journals.

  • The NCBI C++ Toolkit provides a set of modules to access, modify, generate and deposit biological data. The full description can be read in its online book

  • PubChem, a chemical library database, has its own API to search and retrieve chemical compounds

205 questions
2
votes
1 answer

Has anyone used pubchemdb? Any similar API?

Update: The link in the answer is both interesting and useful, but unfortunately does not address the need for a java API, so I am still looking forward to any input. I'm building a database of chemical compounds. I need all the synonyms (IUPAC and…
Aleadam
  • 40,203
  • 9
  • 86
  • 108
2
votes
1 answer

How to use EPOST and than use ESEARCH in biopython?

I have a lit of gene ids: id_list = ["19304878", "18606172", "16403221", "16377612", "14871861", "14630660"] how I can take just the nucleotide sequence of this genes using EPOST and ESEARCH in biopython?
lucaspompeun
  • 170
  • 1
  • 9
2
votes
1 answer

Limiting the number of hits in a Biopython NCBIWWW Search

I'm working on trying to automate some BLAST searches. I need to pick up only the top three results from the BLAST results, however the parameter hitlist_size doesn't seem to be limiting my searches to only three results. No matter what size I…
2
votes
0 answers

Python scripting with ete3 to query NCBI's Taxonomy: "sqlite3 Warning (can only execute one statement at a time)"

I am using this script: import csv import time import sys from ete3 import NCBITaxa ncbi = NCBITaxa() def get_desired_ranks(taxid, desired_ranks): lineage = ncbi.get_lineage(taxid) names = ncbi.get_taxid_translator(lineage) …
ljs
  • 315
  • 2
  • 10
2
votes
1 answer

beautifulsoup web crawling search id list

I am attempting to crawl the ncbi eutils webpage. I want to crawl the Id list from the web as shown in the below: Here's the code for it: import requests from bs4 import BeautifulSoup def get_html(url): """get the content of the…
Thomas.Q
  • 377
  • 1
  • 4
  • 12
2
votes
1 answer

ete3: How to get taxonomic rank names from taxonomy id?

I want to use this to convert a bunch of identifiers but I need to know exactly which taxonomic rank is assigned to each taxonomy code. Shown below is an example of conversion that makes sense but I don't know what to label some of the taxonomy…
O.rka
  • 29,847
  • 68
  • 194
  • 309
2
votes
1 answer

taxid2wgs.pl: undefined symbol: Perl_xs_handshake

I trying to run a Perl script (taxid2wgs.pl) used in searching a taxonomic subset of WGS. taxid2wgs.pl (available at ftp://ftp.ncbi.nlm.nih.gov/blast/WGS_TOOLS). $ ./taxid2wgs.pl -title "Bacteria WGS" -alias_file bacteria-wgs 2 Here, 2 is the taxid…
Penny Liu
  • 15,447
  • 5
  • 79
  • 98
2
votes
1 answer

How can I get taxonomic rank names from taxid?

This question is related to: How to get taxonomic specific ids for kingdom, phylum, class, order, family, genus and species from taxid? The solution given there works but I would like to have the names for each taxonomic ids for defined ranks. I…
utritala
  • 61
  • 1
  • 4
2
votes
2 answers

Error when using NCBIWWW from biopython

I am trying to blast nucleotide sequence using NCBIWWW from Bio.Blast import NCBIWWW my_query = "TGCGTGCCGTGCAATGTGCGT" result_handle = NCBIWWW.qblast("blastn", "nt", my_query) blast_result = open("my_blast.xml", "w")…
reut
  • 21
  • 2
2
votes
1 answer

Get XML paragraphs without nested tables

I'm parsing XML docs from PubMed Central and sometimes I find paragraphs with nested tables like the example below. Is there a way in R to get the text and exclude the table? doc <- xmlParse("

Text

More text

Chris S.
  • 2,185
  • 1
  • 14
  • 14
2
votes
1 answer

The new RefSeq release from NCBI is compatible with Bio.Entrez.Parser?

I'm new with python and especially with Biopython. I'm trying to take some information from an XML file with Entrez.efetch and then read it. Last week this script worked well: handle = Entrez.efetch(db="Protein", id="YP_008872780.1",…
Iñaki
  • 21
  • 3
2
votes
1 answer

how to get a specific protein sequence using entrez.efetch?

I am trying to get the protein sequence from NCBI via a gene id (GI) number, using Biopython's Entrez.fetch() function. proteina = Entrez.efetch(db="protein", id= gi, rettype="gb", retmode="xml"). I then read the data using: proteinaXML =…
daniel_hck
  • 1,100
  • 3
  • 19
  • 38
2
votes
2 answers

How to copy content from a dynamic page using PHP?

Is it possible to get the information displayed in the page link given below using PHP. I want all the text content displayed on the page to be copied to a variable or to a…
SRKR
  • 33
  • 8
2
votes
1 answer

In a hash, how do you add two values for the same key instead of overwriting?

Basically I have these files (medline from NCBI). Each is associated with a journal title. Each has 0, 1 or more genbank identification numbers (GBIDs). I can associate the number of GBIDs per file with each journal name. My problem is that I may…
kbearski
  • 35
  • 5
1
vote
1 answer

Extracting viral host from Genbank record or Entrez query

I would like to be able to see the viral host organism from a number of Genbank records. I have tried this through downloading Genbank full files and reading them with Biopython.SeqIO.read(), and I have also tried querying the database through…
darkwing
  • 130
  • 6
1 2
3
13 14