I have the following code in R, which I took from how to extract genes from genbank file in R
I wish to create a data.frame using the whole data from the genbank file so I can use the metadata to automatically filter out several entries from my source file I don't need.
This is my script right now:
library(genbankr)
library(stringr)
library(purrr)
gb <- readGenBank("sequence.gb")
GENES <- genes(gb)
GenesDF <- data.frame(GENES)
And when I try to run it, I keep getting the following error:
Error in readGenBank("sequence.gb") :
all(sapply(text, function(x) identical(substr(x[1], 1, 5), "LOCUS"))) is not TRUE
I have no idea what's going on, and I can't continue.
The input test file I'm using (sequence.gb) can be downloaded here, but this happens with every NCBI genbank database file from ncbi.nlm.nih.gov I tested with: https://drive.google.com/file/d/1pdIBG8p7i1C5LlY9lvzmSL2iKFlkab1W/view?usp=share_link
I need the metadata as well, so downloading in other formats is a no-go, since the server doesn't include the metadata there.
I tried also downloading and running this parser: https://github.com/dewshr/NCBI-Genbank-file-parser but it also strips the metadata, making it useless.