Questions tagged [ncbi]

NCBI is a National Center for Biotechnology Information, one of the most important websites used by bioinformaticians. NCBI runs a big variety of various bioinformatical web services, also provides important databases for download.

The NCBI covers a wide range of bioinformatics resources, from journal listing to gene alignments to chemical libraries databases to protein folding prediction.

NCBI's data is publicly available from the main website and from ftp repositories.

  • PubMed
    PubMed, a database of citations and abstracts for biomedical literature from MEDLINE and additional life science journals.

  • The NCBI C++ Toolkit provides a set of modules to access, modify, generate and deposit biological data. The full description can be read in its online book

  • PubChem, a chemical library database, has its own API to search and retrieve chemical compounds

205 questions
1
vote
0 answers

PSIPRED, how to install and make it work?

I am trying to setup PSIPRED to run a secondary structure prediction but the README.md file is not detailed enough for a beginner like me to setup and run PSIPRED. I am trying to use it with BLAST+. I am on a Linux Ubuntu 17.04 operating system, and…
AC Research
  • 71
  • 1
  • 4
1
vote
1 answer

How to keep Protein ID when retrieving coding sequences with rentrez

I have a bunch of protein IDs and I need to retrieve the corresponding coding sequences (CDSs). I have managed to retrieve the CDSs but the names of each sequence change from XP* to XM*, and I need to retain the XP* header for each sequence.…
Santiago
  • 67
  • 6
1
vote
1 answer

How to track which protein ID is linked to which gene ID with rentrez

I have a bunch of protein IDs and I want to fetch the corresponding coding sequences (CDSs) without loosing the protein ID. I have managed to download the corresponding CDSs, but unfortunately, CDSs IDs are very different from protein IDs in NCBI. I…
Santiago
  • 67
  • 6
1
vote
0 answers

Flatten recursive taxonomic table (Oracle)

I'm fairly new to Oracle and completely new to taxonomic data, so please bear with me... I have a large table of data that looks like this: | tax_id | value | |:------:|:-----:| | 211 | 56.4 | | 326 | 2.7 | | 47 | 89.6 | | 569 | …
JamesS
  • 310
  • 3
  • 10
1
vote
1 answer

Loop saving output to matrix

I'm trying to access the NCBI SRA database, query it for a list of IDs and save the output to a matrix. I'm using the sradb package from Bioconductor to do this and now I can access and query the database, but its really slow and I couldn't quite…
MenieM
  • 11
  • 1
1
vote
1 answer

Is Pubmed returning invalid XML results?

I am using JEUtils to fetch and parse Pubmed results in Java (it's a tool which seems to be abandoned). Since a few days ago the tool is throwing exceptions in some results, and upon inspection it seems that Pubmed is not respecting its own DTD (the…
mmalmeida
  • 1,037
  • 9
  • 27
1
vote
0 answers

Loading NCBITaxa crashes

I've been using the ete2 module on very powerful servers for some time. Everything was fine until it started going very slowly (one get_taxid_translator() function per minute), now I cannot even get past ncbi = NCBITaxa() assignment. I have…
Alexis Lucattini
  • 1,211
  • 9
  • 13
1
vote
3 answers

Paste some elements of mixed vector

I have a vector with terms that may be followed by zero or more qualifiers starting with "/". The first element should always be a term. mesh <- c("Animals", "/physiology" , "/metabolism*", "Insects", "Arabidopsis", "/immunology"…
Chris S.
  • 2,185
  • 1
  • 14
  • 14
1
vote
0 answers

Local blasting (16SrRNA) - taxonomy annotations from NCBI different from Silva

I am performing local blasting against both the NCBI database and the Silva database using this commands: blastn -db db/16SMicrobial -query input.fa -out outputNCBI.csv -task blastn -dust no -max_target_seqs 1 -outfmt "10 pident evalue bitscore…
umbra
  • 111
  • 1
  • 1
  • 10
1
vote
1 answer

Alternative to Bio.Entrez EFetch for downloading full genome sequences from NCBI

My goal is to download full metazoan genome sequences from NCBI. I have a list of unique ID numbers for the genome sequences I need. I planned to use the Bio.Entrez module EFetch to download the data but learned today via the Nov 2, 2011 release…
1
vote
1 answer

How can I return corresponding fasta protein sequences from ncbi from multiple accession numbers in python?

I'm having some difficulty downloading fasta sequences for multiple accession numbers in a text file using a python script. I can do this OK for a single accession number e.g: import sys from Bio import Entrez Entrez.email = "X@Y.com" handle =…
wl284
  • 53
  • 9
1
vote
1 answer

How to use Bioproject ID, for example, PRJNA12997, in biopython?

I have an Excel file in which are given more then 2000 organisms, where each one of them has a Bioproject ID associated (like PRJNA12997). The idea is to use these IDs to get the sequence for a later multiple alignment with other five sequences that…
1
vote
0 answers

return empty result via using Entrez,Efetch to search lineage from taxonomy db

I used biopython to search lineage information from taxonomy database, but it returns empty ! I can used it yesterday(2016/3/15) ! But now I can't used it(2016/03/16)! The code I used is here, >>> from Bio import Entrez >>> Entrez.email =…
Stacey Wu
  • 11
  • 1
1
vote
1 answer

socket.gaierror while downloading genbank files w/ biopython

I would like to download genbank files from NCBI using Biopython and a list of accession numbers (note that I call the script with an email address as an argument e.g., python scriptName.py emailAddress) import os import os.path …
cer
  • 1,961
  • 2
  • 17
  • 26
1
vote
0 answers

Trouble Iteratively Parsing several XML results into PHP

I am trying to write a PHP script to take advantage of the E-utilities service at NCBI (national center for biotechnology information). I can supply a url with a search term ("alaS" in this example) and retrieve the XML result with no problem, with…