SimRAD users: fasta import to prevent insilico.digest strsplit error

Question

I'm trying to use the code below to do an in silico digestion for a RADseq run within the package SimRAD. I believe the problem is with the loading of the data - I keep getting the following error:

Error in strsplit(DNAseq, split = recognition_code, fixed = FALSE, perl = FALSE) : non-character argument

within my Fasta file my fasta header is: gi 32456060 emb BX526834 Cryptosporidium parvum chromosome 6 I removed all . , and | that were previously in the filename, as I read that | in particular can cause strsplit errors.

Unfortunately, none of the SimRAD documents that I have found dealt with this issue and all the strsplit errors I've found forum topics on a related to fairly different sample types and I haven't been able to pinpoint what to change or modify to prevent the error.

Here is the code including the required packages:

    ###-----Start Code
    source("http://bioconductor.org/biocLite.R")
    biocLite("Biostrings")
    biocLite("ShortRead")

    install.packages("~/Downloads/SimRAD_0.95.tgz", repos = NULL)

    library(Biostrings)
    library(ShortRead)
    library(seqinr)
    library(SimRAD)

    #Restriction Enzyme 1
    #MseI #
    MseIcs_5p <- "T"
    MseIcs_3p <- "TAA"
    #Restriction Enzyme 2
    #EcoRI#
    EcoRIcs_5p <- "G"
    EcoRIcs_3p <- "AATTC"

    ##these are two alternative means I've tried to read in a fasta file 
    ##I believe this is the the problem line - either a new line for
    ##importing or #some subsequent line is needed to prevent the error that    
    ##follows
    CryptoParChr6 <- read.fasta(file = "filepath.fasta")
    CryptoParChr6 <- readDNAStringSet("filepath.fasta", "fasta")

    ##the error comes in at this line
    CryptoParChr6.dig <- insilico.digest(CryptoParChr6, MseIcs_5p,  
            MseIcs_3p, EcoRIcs_5p, EcoRIcs_3p, verbose = TRUE)

    ###-----End Code

If anyone is familiar with SimRAD and has any suggestions for importing fasta files properly - I'd appreciate any help.

score 0 · Answer 1 · answered Jul 30 '15 at 14:50

Your problem has little to do with the SimRAD package per se. As is explicitly written in ?insilico.digest the first argument must be a character string (or a vector thereof). E.g. read.fasta outputs a list of SeqFastadna objects instead. So, you have to extract the sequences themselves:

myFasta     <- read.fasta(file = "filepath.fasta", as.string = 1)
mySequences <- unlist(myFasta)
myDigest    <- insilico.digest(myFasta, MseIcs_5p, MseIcs_3p, EcoRIcs_5p, EcoRIcs_3p, verbose = TRUE)

SimRAD users: fasta import to prevent insilico.digest strsplit error

1 Answers1