1

i have a csv file with DNA sequences. The file has 4 columns which are the name of the chromosome, the start and end of the sequence and the strand (missing or +). I want to transorme this file in fasta format with Rstudio and with the tool of biostring. But i don't know much about the code whick i have to use. First i download the library of biostring. i use this code: c

sv = read.csv("foo.csv")
   fa = character(4 * nrow(csv))
   fa[c(TRUE, FALSE)] = sprintf("> %s", csv$chr)
   fa[c(FALSE, TRUE)] = csv$seq
   writeLines(fa, "foo.fasta")
library(Biostrings)
    seq = csv$seq
    names(seq) = csv$id
    dna = DNAStringSet(seq)
    write.XStringSet(dna, "foo.fasta").

also when i run the code: fa[c(TRUE, FALSE)] = sprintf("> %s", csv$chr), it appears ---Error in fa[c(TRUE, FALSE)] = sprintf("> %s", csv$chr) : replacement has length zero do you reccomend me some other code? or what should change in this code. Thank you

  • Does this answer your question? [Extract sequence fragments from FASTA file using coordinates on a GRanges object](https://stackoverflow.com/questions/44333936/extract-sequence-fragments-from-fasta-file-using-coordinates-on-a-granges-object) – csgroen Aug 23 '21 at 18:44
  • You state the `.csv` file has 4 columns: chromosome, start, end, and stand. Where is the sequence itself? Otherwise, we'd need to know the organism and assembly to answer the question. – Ian Campbell Aug 25 '21 at 02:43

1 Answers1

0

If you don't mind not using R to achieve your goal. You can look at bedtools getfasta (documentation) for a command-line solution in Linux.

The usage of is quite simple:

bedtools getfasta -fi yourFasta.fa -bed foo.bed -s

-s option for force strandness.

But first you may need to convert your csv foo.csv to a bed format. You can do this in R also:

library(reader)
df <- read_csv("foo.csv")
write_tsv(df, "foo.bed", col_names=FALSE)
Yichao Cai
  • 11
  • 2