merge multiples sequences in a single variable using Biostrings in R

Question

I have a R list with multiples Biostrings sequences (genomes), and each element has 4000 genes (sequences) or more

class(myseq)
[1] "list"

names(myseq)
 [1] "Genome-01" "Genome-02" "Genome-03"

myseq[["Genome-01"]]
DNAStringSet object of length 4368:
       width seq                                                                    names               
   [1]   516 ATGACAACTGAGCCAACAGTCATAATTGGACTGC...AACAAAAACCGTTTGCGGGTTAAGGAAAGCTGA CPLOIJDP_00001
   [2]   408 ATGTCCAACACTGAATCGATCTGCGTATCAACGC...CAGCCAGAGCGCGTGGGCGAGTTTCATTTGTAA CPLOIJDP_00002
[4367]    76 GCGACACTAGCTCAGTTGGTAGAGCGCAACCTTG...AGGTCACGAGTTCGAACCTCGTGTGTCGCTCCA CPLOIJDP_04367
[4368]    72 GGTGGAGCAGCTTGGTAGCTCGTCGGGCTCATAA...AGGTCGTCGGTTCAAATCCGGCCCCCGCAACCA CPLOIJDP_04368

and so on, with multiples genomes (-02, -03..... genome-n). So I just want to extract all sequences from all genomes in the mylist and generate a single DNAStringSet (Biostrings) variable that contains all the sequences. without a loop:

allgenes <- c(myseq[[1]], myseq[[2]], myseq[[3]])
or 
allgenes <- c(myseq[["Genome-01"]], myseq[["Genome-02"]], myseq[["Genome-03"]])

but I need to do it in a loop

I just have tried

allgenes <- c()
for(i in 1:length(myseq)){ 
      allgenes[i] <- myseq[[i]]
}

and without [i] in allgenes, but it doesn't work !!!

Thanks for your help !!!!

I haven't used it, but according to the [Biostrings manual](https://bioconductor.org/packages/devel/bioc/manuals/Biostrings/man/Biostrings.pdf) DNAStringSet class has method `append(x, values, after=length(x)): Add sequences in values to x` where x and values are XStringSet objects. Then you could do `allgenes <- myseq[1]; for(i in 2:length(myseq)){append(allgenes, myseq[i])}` — Cloudberry, Jan 14 '23 at 14:43
Does this answer your question? https://stackoverflow.com/q/26303648/570918 — merv, Jan 16 '23 at 08:04

score 0 · Answer 1 · answered Jan 19 '23 at 19:26

Just do an unlisting:

dna <- Biostrings::DNAStringSet(x=c(string1="ATCCG", string2="TTTT"), use.names = TRUE)
dna
#> DNAStringSet object of length 2:
#>     width seq                                               names               
#> [1]     5 ATCCG                                             string1
#> [2]     4 TTTT                                              string2

unlist(dna)
#> 9-letter DNAString object
#> seq: ATCCGTTTT

merge multiples sequences in a single variable using Biostrings in R

1 Answers1