Questions tagged [ncbi]

NCBI is a National Center for Biotechnology Information, one of the most important websites used by bioinformaticians. NCBI runs a big variety of various bioinformatical web services, also provides important databases for download.

The NCBI covers a wide range of bioinformatics resources, from journal listing to gene alignments to chemical libraries databases to protein folding prediction.

NCBI's data is publicly available from the main website and from ftp repositories.

  • PubMed
    PubMed, a database of citations and abstracts for biomedical literature from MEDLINE and additional life science journals.

  • The NCBI C++ Toolkit provides a set of modules to access, modify, generate and deposit biological data. The full description can be read in its online book

  • PubChem, a chemical library database, has its own API to search and retrieve chemical compounds

205 questions
1
vote
1 answer

How to retrieve NCBI Entrez summary using gene name with Biopython?

I've explored a variety of options and solutions online, but I can't seem to quite figure this out. I'm new to using Entrez so I don't fully understand how it works, but below was my attempt. My goal would be to print out the online summary, so for…
1
vote
1 answer

Why would a special character cause an R function to drop components of the search string?

I am using the R package 'easyPubMed' to investigate species and the research effort (i.e. total number of publications) on those species. Typically, I can use the function get_pubmed_ids("example string[TI]") to return information from the NCBI…
GML0011
  • 11
  • 1
1
vote
1 answer

BioPython Entrez article limit

I've been using the classic article function which returns the articles for a string from Bio import Entrez, __version__ print('Biopython version : ', __version__) def article_machine(t): Entrez.email = 'email' handle =…
Noamiz
  • 11
  • 2
1
vote
0 answers

How do I download a large number of GenBank sequences using entrez_fetch in R?

I am trying to download sequence data from 1283 records in GenBank using rentrez. I'm using the following code, first to search for records fitting my criteria, then linking across databases, and finally fetching the sequence data: # Search for…
1
vote
1 answer

Change ID in multiple FASTA files

I need to rename multiple sequences in multiple fasta files and I found this script in order to do so for a single ID: original_file = "./original.fasta" corrected_file = "./corrected.fasta" with open(original_file) as original,…
adcm67
  • 23
  • 2
1
vote
0 answers

NCBI Blast API use QuickBLASTP program

I'm using NCBI Blast's web API. In their browser-based Blast search, they offer a program called QuickBLASTP which is immensely faster than the regular blastp. How can I use this program inside of their web API? On the documentation, it says that…
Ayush
  • 86
  • 7
1
vote
1 answer

Obtaining data from NCBI gene database with R

Rentrez package I was discovering rentrez package in RStudio (Version 1.1.442) on a lab computer in Linux (Ubuntu 20.04.2) according to this manual. However, later when I wanted to run the same code on my laptop in Windows 8 Pro (RStudio 2021.09.0…
Eugene Bu
  • 51
  • 5
1
vote
0 answers

How to use biopython Entrez efetch to get genbank file from "gene" database

I am trying to programmatically get whole genes ( with intron and exon structure as defined by CDS) using Biopython Entrez esearch and efetch utilities. from Bio import Entrez Entrez.email = "myemail@gmail.com" handle =…
harijay
  • 11,303
  • 12
  • 38
  • 52
1
vote
1 answer

Number of NCBI publications published with a keyword, grouped by year

I want to make a dictionary with the year as the keys, and the number of publications containing a keyword that was published in that year as the value. I've written this script: from Bio import Entrez from Bio import Medline from metapub import…
Slowat_Kela
  • 1,377
  • 2
  • 22
  • 60
1
vote
0 answers

how to download automatically complete genomes with biopython on ncbi

I'm trying to download automatically all the Klebsiella pneumoniae genomes with biopython. Apparently now there are Size(Mb): 5.68232 Plasmids: 6 Assemblies: 10650. Please I have tried with the function Entrez.esearch. using this link …
Yad.yos
  • 45
  • 6
1
vote
2 answers

I'm having difficulty using Beautiful Soup to scrape data from an NCBI website

I can't for the life of me figure out how to use beautiful soup to scrape the isolation source information from web pages such as this: https://www.ncbi.nlm.nih.gov/nuccore/JOKX00000000.2/ I keep trying to check if that tag exists and it keep…
Carlee B
  • 11
  • 1
1
vote
1 answer

Getting top 10 sequences of BLAST results Bio Python

I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file…
Ank
  • 6,040
  • 22
  • 67
  • 100
1
vote
0 answers

How do I configure blast+ to access databases on an external hard drive?

I installed the blast+ program on my computer to use BLAST locally. I also downloaded the complete nr database to my external hard drive. To access it I created an environment variable called BLASTDB with the path to my external hard drive, as was…
Dennis
  • 9
  • 4
1
vote
1 answer

Using Taxize Package to Get a Dataframe of Family names from Species list

I am using Taxize package to get the Family names from a list of species. Please see an example below: example <- c("Procyon lotor", "Bos taurus", "Homo sapiens") example <- data.frame(example) example 1 Procyon lotor 2 Bos taurus 3 …
aholtz
  • 175
  • 6
1
vote
1 answer

Scrape data from NCBI books section?

I'm currently writing a program which requires me to scrape articles from the NCBI. I'm using the Entrez Utilities to do this (https://www.ncbi.nlm.nih.gov/books/NBK25497/). I have figured out how to do this with PubMed data, namely by using handle…
Oliver James
  • 63
  • 1
  • 6