4

Using R for bioinformatics here: I have a list of DNAstringsSets(seen below) and want to use the writeXstringset() function which takes a DNAstringset object as an argument in order to save as a FASTA file.Anyone knows how is it possible to collapse the list of DNAstringsets into a single DNAStringset object and use it as an argument?

$NM_008866
  A DNAStringSet instance of length 13
     width seq                                                        names               
 [1]   693 ATGTGCGGCAACAACATGTCCGCTCCGA...GATAAGCTCCTACCTCCAATTGATTGA NM_008866
 [2]    72 ATGGATGGGCAGAAGCCTTTGCAGGTAT...AATACATCTGTCCACATGCCCCTGTGA NM_008866
 [3]   114 ATGGGCAGAAGCCTTTGCAGGTATCAAA...GAATATGGCTATGCCTTCTTGGTTTGA NM_008866
 [4]   213 ATGGCATTCCTTCTAACAGGATTATTTT...AGTGCCATGGAGATTGTGACCCTTTAG NM_008866
 [5]    63 ATGTCAAGCACTTCATTGATAAGCTCCT...TTGATTGACATCACTAAGAGGCCTTGA NM_008866
 ...   ... ...
 [9]   219 ATGGCCCTTCTATTGGGAGACCAGGCTT...CAGAGGCAGGCGGATCTCTGTCAATAG NM_008866
[10]   144 ATGTTATGCTTAAAACCAAATACTGTTC...CAGTCTCCTGTACAAATATTAAAATAA NM_008866
[11]    78 ATGTTGCAAAAATTATGGTTATTTCTGA...CCAACCAACCAAGAAGCACCTTTATAA NM_008866
[12]    75 ATGGTTATTTCTGAACGGTTGCTTTTCT...AGAAGCACCTTTATAAACAGGTGCTAA NM_008866
[13]    90 ATGTCTGGATTTAAAACAATTTCAAACA...AATTTACTTCAGTTATTCTATCTGTAA

$NM_001159750
  A DNAStringSet instance of length 9
   width seq                                                         names               
[1]   903 ATGGAGGACGAGGTGGTTCGCATTGCCA...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_001159750
[2]   105 ATGGACCATCAACTGATAAAGACCCTGA...AGAGAAGAAAGTTCCAGCAGCAATGTAA NM_001159750
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_001159750
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_001159750
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_001159750
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_001159750
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_001159750
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_001159750
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_001159750

$NM_011541
  A DNAStringSet instance of length 9
    width seq                                                         names               
[1]   906 ATGGAGGACGAGGTGGTTCGCATTGCCA...ATGTGGAAATCGGTGGAAGTTCTGTTGA NM_011541
[2]   108 ATGGACCATCAACTGATAAAGACCCTGA...GAAGAAAGTAGTTCCAGCAGCAATGTAA NM_011541
[3]    75 ATGAGACAAATGCTCGAGATACATATGT...CCAAGCACTTCTGATTCTGTGCGATTAA NM_011541
[4]    75 ATGATTATGTTGCAATTGGAGCTGATGA...ATTGAGGAAGCTATATATCAAGAAATAA NM_011541
[5]   129 ATGAATGTGGAAATCGGTGGAAGTTCTG...GCCAGGCAACTCGTTTCCTTGCAAGTGA NM_011541
[6]    63 ATGTGGAAATCGGTGGAAGTTCTGTTGA...AGAATTGGCAAAGTATCTGGACCATTAA NM_011541
[7]   102 ATGTGTCCCACTTGTTTTGCTAGTAATA...TATAGTAAAGGCCACTTTTATAAATTAA NM_011541
[8]   102 ATGGAAAACAATATGTCCATGTTAAAAG...CGGGAGGCAGAGGCAGGCGGATTTCTGA NM_011541
[9]    75 ATGGATAATTTCTGTCACTTTAAAAATA...TAGTTTAAAAGTAATAAGGTTAAAATAG NM_011541
NEWSCIENT
  • 57
  • 1
  • 3

2 Answers2

5

A very minimal reproducible example. Interestingly, this will not work if each element of the list has a name (i.e. just returns the same list). Make sure that names(dna_list) <- NULL. I am unsure of the specific reason for this, perhaps someone else may know and would care to comment.

require(Biostrings)
x0 <- DNAStringSet(c("CTCCCAGTAT", "TTCCCGA", "TACCTAGAG"))
x1 <- DNAStringSet(c("AGGTCGT", "GTCAGTGGTCCCC", "CATTTTAGG"))
x2 <- DNAStringSet(c("TGCTAGCTA", "AGTCTTGC", "AGCTTTCGAG"))
dna_list <- list(x0, x1, x2)
> dna_list
[[1]]
  A DNAStringSet instance of length 3
    width seq
[1]    10 CTCCCAGTAT
[2]     7 TTCCCGA
[3]     9 TACCTAGAG

[[2]]
  A DNAStringSet instance of length 3
    width seq
[1]     7 AGGTCGT
[2]    13 GTCAGTGGTCCCC
[3]     9 CATTTTAGG

[[3]]
  A DNAStringSet instance of length 3
    width seq
[1]     9 TGCTAGCTA
[2]     8 AGTCTTGC
[3]    10 AGCTTTCGAG

do.call(c, dna_list)
> do.call(c, dna_list)
  A DNAStringSet instance of length 9
    width seq
[1]    10 CTCCCAGTAT
[2]     7 TTCCCGA
[3]     9 TACCTAGAG
[4]     7 AGGTCGT
[5]    13 GTCAGTGGTCCCC
[6]     9 CATTTTAGG
[7]     9 TGCTAGCTA
[8]     8 AGTCTTGC
[9]    10 AGCTTTCGAG
cdeterman
  • 19,630
  • 7
  • 76
  • 100
  • Yes i see,but it keeps on giving me a Largelist and not a single DNAStringSet.i don't know if the problem is the names of the transcripts that my list of DNAstringSet carries! What do you thing about union() method? – NEWSCIENT Oct 10 '14 at 16:32
  • @stbac, did you trying setting `names(dna_list) <- NULL`? The `union` method, to my knowledge, is used for vectors not objects but I could be corrected on this by someone more knowledgeable. – cdeterman Oct 10 '14 at 16:34
  • Any idea why writeXstringSet(DNAStringSet,filepath,format="fasta") gives me the error : Error in .Call2("write_XStringSet_to_fasta", x, efp_list, width, lkup, : RAW() can only be applied to a 'raw', not a 'NULL' – NEWSCIENT Oct 10 '14 at 17:01
  • @stbac, it works for my example so I currently cannot replicate this error. Are there NULL's in your dataset? Would be best if could replicate your dataset. – cdeterman Oct 10 '14 at 17:09
  • Yes i found a Null element in my DNAtringSet, don't really know why it is there! it shouldn't be after the filters i had in my code! you know how i can remove it? – NEWSCIENT Oct 10 '14 at 17:36
  • @stbac, can you provide a small excerpt of where the null is (e.g. seq)? – cdeterman Oct 10 '14 at 17:45
  • 3
    To avoid the need to nullify the names, combine into a formal `DNAStringSetList` and then `unlist`: `unlist(DNAStringSetList(dna_list))`; see support.bioconductor.org for more focused support – Michael Lawrence Oct 10 '14 at 22:18
  • If I have that final list, how can I get just the fastas without width? Basically, a normal character column for a dataframe... – Ricardo Guerreiro Jan 16 '19 at 13:19
  • @MichaelLawrence As of April 2023, `names(dna_list) <- NULL` approach doesn't work, only your solution did the trick – Poiu Rewq Apr 11 '23 at 07:27
0

By Michael Lawrence's comment in @cdeterman's answer.

Use DNAStringSetList as container for the DNA stringsets to maintain names, then unlist the object to obtain a DNAStringSet object

names(x0) <- c("Hi", "My", "NameIs")
names(x1) <- c("JohnJohn", "Mcgee", "Pawlson")
names(x2) <- c("Whats", "Your", "Name")
dna_list <- Biostrings::DNAStringSetList(x0, x1, x2)
dna <- unlist(dna)
Biostrings::writeXStringSet(x = dna, filepath = "path/to/dna.fasta")