I'm trying to use the code below to do an in silico digestion for a RADseq run within the package SimRAD. I believe the problem is with the loading of the data - I keep getting the following error:
Error in strsplit(DNAseq, split = recognition_code, fixed = FALSE, perl = FALSE) : non-character argument
within my Fasta file my fasta header is: gi 32456060 emb BX526834 Cryptosporidium parvum chromosome 6 I removed all . , and | that were previously in the filename, as I read that | in particular can cause strsplit errors.
Unfortunately, none of the SimRAD documents that I have found dealt with this issue and all the strsplit errors I've found forum topics on a related to fairly different sample types and I haven't been able to pinpoint what to change or modify to prevent the error.
Here is the code including the required packages:
###-----Start Code
source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
biocLite("ShortRead")
install.packages("~/Downloads/SimRAD_0.95.tgz", repos = NULL)
library(Biostrings)
library(ShortRead)
library(seqinr)
library(SimRAD)
#Restriction Enzyme 1
#MseI #
MseIcs_5p <- "T"
MseIcs_3p <- "TAA"
#Restriction Enzyme 2
#EcoRI#
EcoRIcs_5p <- "G"
EcoRIcs_3p <- "AATTC"
##these are two alternative means I've tried to read in a fasta file
##I believe this is the the problem line - either a new line for
##importing or #some subsequent line is needed to prevent the error that
##follows
CryptoParChr6 <- read.fasta(file = "filepath.fasta")
CryptoParChr6 <- readDNAStringSet("filepath.fasta", "fasta")
##the error comes in at this line
CryptoParChr6.dig <- insilico.digest(CryptoParChr6, MseIcs_5p,
MseIcs_3p, EcoRIcs_5p, EcoRIcs_3p, verbose = TRUE)
###-----End Code
If anyone is familiar with SimRAD and has any suggestions for importing fasta files properly - I'd appreciate any help.