-4

I have a fasta file where the sequences are broken up with newlines. I'd like to remove the newlines. Here's an example of my file:

>accession1
ATGGCCCATG
GGATCCTAGC
>accession2
GATATCCATG
AAACGGCTTA

I'd like to convert it into this:

>accession1
ATGGCCCATGGGATCCTAGC
>accession2
GATATCCATGAAACGGCTTA

Anyone can solve this problem using R? Thanks!

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Quang Ong
  • 1
  • 2
  • 2
    What is your OS please? – Rich Scriven Jul 28 '15 at 02:56
  • 4
    Please show what you have tried. – frasertweedale Jul 28 '15 at 02:57
  • 1
    If you are on a Unix based OS (and you probably should if you're going to work with genomics), there are several tools to facilitate Fasta manipulations, though this can be easily done with basic terminal commands. In R, try using `gsub` and `\\n`. – Molx Jul 28 '15 at 03:12
  • 1
    When you ask question, it's a good idea to stick by the computer for a little to make sure people understand what you are asking. – Rich Scriven Jul 28 '15 at 03:17

2 Answers2

0

Probably something along the lines of:

 genelist <- list(accession1,accession2)
lappy(genelist, paste0, collapse="")
IRTFM
  • 258,963
  • 21
  • 364
  • 487
0

Try the seqinr package which provides both read.fasta() and write.fasta() functions. The latter allows you to control the wrap width of the output.

So, assuming your fasta data is in the file sequences.fa

install.packages('seqinr') # do this once
library(seqinr)

seqs = read.fasta(file='sequences.fa')
write.fasta(seqs, names(seqs), nbchar=80, file.out='sequences2.fa')
oddHypothesis
  • 195
  • 1
  • 8