Remove line breaks in a FASTA file in r

Question

I have a fasta file where the sequences are broken up with newlines. I'd like to remove the newlines. Here's an example of my file:

>accession1
ATGGCCCATG
GGATCCTAGC
>accession2
GATATCCATG
AAACGGCTTA

I'd like to convert it into this:

>accession1
ATGGCCCATGGGATCCTAGC
>accession2
GATATCCATGAAACGGCTTA

Anyone can solve this problem using R? Thanks!

If you are on a Unix based OS (and you probably should if you're going to work with genomics), there are several tools to facilitate Fasta manipulations, though this can be easily done with basic terminal commands. In R, try using `gsub` and `\\n`. — Molx, Jul 28 '15 at 03:12
When you ask question, it's a good idea to stick by the computer for a little to make sure people understand what you are asking. — Rich Scriven, Jul 28 '15 at 03:17

score 0 · Answer 1 · answered Jul 28 '15 at 03:29

0

Probably something along the lines of:

 genelist <- list(accession1,accession2)
lappy(genelist, paste0, collapse="")

answered Jul 28 '15 at 03:29

IRTFM

258,963
21
364
487

oddHypothesis · Answer 2 · 2015-07-28T06:12:21.660

0

Try the seqinr package which provides both read.fasta() and write.fasta() functions. The latter allows you to control the wrap width of the output.

So, assuming your fasta data is in the file sequences.fa

install.packages('seqinr') # do this once
library(seqinr)

seqs = read.fasta(file='sequences.fa')
write.fasta(seqs, names(seqs), nbchar=80, file.out='sequences2.fa')

edited Jul 28 '15 at 06:12

answered Jul 28 '15 at 06:05

oddHypothesis

195
1
8

Remove line breaks in a FASTA file in r

2 Answers2