0

I have a data.frame that looks like this.

bed <- data.frame(chrom=c(rep("Chr1",5)),
                        chromStart=c(18915152,24199229,73730,81430,89350),
                        chromEnd=c(18915034,24199347,74684,81550,89768), 
                         strand=c("-","+","+","+","+"))

write.table(bed, "test_xRNA.bed",row.names = F,col.names = F, sep="\t", quote=FALSE) 

Created on 2022-07-29 by the reprex package (v2.0.1)

and I want to convert it into a bed file. I try to do it with the writing.table function, but I fail miserably by taking this error comment when I look at the intersect

Error: unable to open file or unable to determine types for file test_xRNA.bed

- Please ensure that your file is TAB delimited (e.g., cat -t FILE).
- Also ensure that your file has integer chromosome coordinates in the 
  expected columns (e.g., cols 2 and 3 for BED).

Any ideas of how I can properly convert a data.frame to a .bed file in R?

I have heard about the rtracklayer package, does anyone have an experience with it?

I have tried the following post but it does not work at all for me export file from R in bed format. Any help is highly appreciated

LDT
  • 2,856
  • 2
  • 15
  • 32
  • Try making the `C` in the chromosome column lower case (just a guess) - that's definitely the convention. – user438383 Jul 29 '22 at 16:46
  • Another issue with your data (apart from the fact that you would either need 3 columns, or 6 with "name", "strand" and "score" columns after the first 3 for a canonical BED format) is that many programs dealing with flavors of this format expect that each range has an end that is greater or equal to its start position. – user12728748 Jul 29 '22 at 19:50
  • Another wild guess - are you sure that the path to the file is correct in your tool? The error says also, 'unable to open'. That could also mean, that the tool does not even find your file... – Daniel Fischer Aug 01 '22 at 09:31

2 Answers2

1

Check the BED format specification. The first three columns (chromosome, start, end) are obligatory. Strand is the sixth column, and if you want to use it, you need to include columns 4 (name) and 5 (score). They can be filled with "." if you have nothing to put there.

bed <- data.frame(chrom=c(rep("Chr1",5)),
                  chromStart=c(18915152,24199229,73730,81430,89350),
                  chromEnd=c(18915034,24199347,74684,81550,89768),
                  name = rep(".", 5),
                  score = rep(".", 5),
                  strand=c("-","+","+","+","+"))
Cloudberry
  • 240
  • 2
  • 8
  • Thank you for your answer, cloudberry. I upvoted the effort, but I do not think is necessary to include columns 4, and 5 as you say. These are optional, and the bed can work without them. The key thing is to always keep the chromStart < ChromEnd and write the data.frame into the right format. – LDT Aug 02 '22 at 07:57
  • @LDT You need columns 4 and 5 only if you want to use strand (column 6). Some programs read only the first three columns and ignore the rest, but strand information is not used then. It's possible that some programs can deal with a non-standard BED file, but I wouldn't count on that without checking. – Cloudberry Aug 02 '22 at 18:20
  • that's a good point. Thanks for contributing I learn a lot – LDT Aug 02 '22 at 19:44
0

I think its a lot more complicated to make a bed file: Here is a solution I have been working on the last days

suppressPackageStartupMessages(library(GenomicRanges))
suppressPackageStartupMessages(library(rtracklayer))
suppressPackageStartupMessages(library(tidyverse))

# data 
bed <- data.frame(chrom=c(rep("Chr1",5)),
                  chromStart=c(18915152,24199229,73730,81430,89350),
                  chromEnd=c(18915034,24199347,74684,81550,89768), 
                  strand=c("-","+","+","+","+"))

# transform such as always chromStart < chromEnd
bed2 <- bed |> 
transform(chromStart=ifelse(chromStart>chromEnd,chromEnd,chromStart),
          chromEnd= ifelse(chromEnd<chromStart,chromStart,chromEnd))

# Genomic Ranges 
bed3 <- GenomicRanges::makeGRangesFromDataFrame(bed2)
head(bed3)
#> GRanges object with 5 ranges and 0 metadata columns:
#>       seqnames            ranges strand
#>          <Rle>         <IRanges>  <Rle>
#>   [1]     Chr1 18915034-18915152      -
#>   [2]     Chr1 24199229-24199347      +
#>   [3]     Chr1       73730-74684      +
#>   [4]     Chr1       81430-81550      +
#>   [5]     Chr1       89350-89768      +
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

# rtracklayer 
bed4 <- rtracklayer::export(bed3, format="bed", ignore.strand = FALSE)
bed4
#> [1] "Chr1\t18915033\t18915152\t.\t0\t-" "Chr1\t24199228\t24199347\t.\t0\t+"
#> [3] "Chr1\t73729\t74684\t.\t0\t+"       "Chr1\t81429\t81550\t.\t0\t+"      
#> [5] "Chr1\t89349\t89768\t.\t0\t+"

# write it as a bed file
# this is essential to make sure that this works properly 
write.table(bed4, "test.bed", sep="\t", col.names=FALSE, row.names = FALSE, append = TRUE, quote = FALSE) 

Created on 2022-08-02 by the reprex package (v2.0.1)

and now you have a functional bed file to work with the bed tools

LDT
  • 2,856
  • 2
  • 15
  • 32
  • 2
    I don't see why we need other packages, the issue seems to be the ensuring start is smaller than end. Then write out as tab separated file. – zx8754 Aug 02 '22 at 10:00
  • I have tried what you suggested and it did not work always perfectly. Passing through GenomicRanges and rtracklayer minimizes the possibility of error – LDT Aug 02 '22 at 11:44
  • If it works then it works, no worries, but for such a small task the number of packages loaded is just too much. – zx8754 Aug 02 '22 at 11:47
  • I agree with you. I never expected it to be so complicated – LDT Aug 02 '22 at 12:05
  • That's right, you need to make the start coordinate smaller than end. Using `rtracklayer::export()` is one way to make sure columns 4 and 5 are included, but they can be added manually :) – Cloudberry Aug 02 '22 at 18:09