0

I have the following code in R, which I took from how to extract genes from genbank file in R

I wish to create a data.frame using the whole data from the genbank file so I can use the metadata to automatically filter out several entries from my source file I don't need.

This is my script right now:

library(genbankr)
library(stringr)
library(purrr)

gb <- readGenBank("sequence.gb")
GENES <- genes(gb)
GenesDF <- data.frame(GENES)

And when I try to run it, I keep getting the following error:

Error in readGenBank("sequence.gb") : 
  all(sapply(text, function(x) identical(substr(x[1], 1, 5), "LOCUS"))) is not TRUE

I have no idea what's going on, and I can't continue.

The input test file I'm using (sequence.gb) can be downloaded here, but this happens with every NCBI genbank database file from ncbi.nlm.nih.gov I tested with: https://drive.google.com/file/d/1pdIBG8p7i1C5LlY9lvzmSL2iKFlkab1W/view?usp=share_link

I need the metadata as well, so downloading in other formats is a no-go, since the server doesn't include the metadata there.

I tried also downloading and running this parser: https://github.com/dewshr/NCBI-Genbank-file-parser but it also strips the metadata, making it useless.

Gaby
  • 1
  • Greetings! Usually it is helpful to provide a minimally reproducible dataset for questions here so people can troubleshoot your problems. One way of doing this is by using the `dput` function. You can find out how to use it here: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Nov 15 '22 at 02:13
  • I did provide an example dataset, it's the sequence.gb file. I posted the link to download it from Google Drive – Gaby Nov 15 '22 at 14:45
  • people can't trust to get unknown files on their computers, it would be better if you add the NCBI ftp link. – Vida Nov 25 '22 at 18:47
  • Ok, makes sense. I'll figure out how to do that. Thanks – Gaby Nov 26 '22 at 21:51

0 Answers0