-1

I am absolutely brand new to coding in R - in fact coding in general, so excuse my ignorance.

I have a data file with 'start' and 'end' position values for features of varying lengths. I would like to output a file that creates bins for each feature (row of data) by percentage through the length of the feature (1 - 100%).

I think this essentially answers the question, but I'm still having issues: R : Create specific bin based on data range

bin_it <- function(START, END, BINS) {
  range <- END-START
  jump <- range/BINS
  v1 <- c(START, seq(START+jump+1, END, jump))
  v2 <- seq(START+jump-1, END, jump)+1
  data.frame(v1, v2)
}

My specific data looks like this:

feature <- data.frame(chrom, start, end, feature_name, value, strand)
chr2L   7529    9484    CG11023 1   +
chr2L   21952   24237   CR43609 1   +
chr2L   65999   66242   CR45339 1   +

Using the code above, I have tried:

bin_it <- function(START, END, BINS) {
      range <- START-END
      jump <- range/BINS
      v1 <- c(START, seq(START+jump, END, jump))
      v2 <- seq(START+jump, END, jump)
      data.frame(v1, v2)
    }

bin_it(feature[,2], feature[,3], 100)

I get this error message:

Error in seq.default(START + jump + 1, END, jump) : 
'from' must be of length 1

Any suggestions on how to fix this?

Update:

As an example from the first row of the data set above: START = 7529, END = 9484, BINS = 10 (to simplify), range = 1955, jump = 195.5

Desired output would be:

      v1       v2
[1]  7529.0  7724.5
[2]  7724.5  7920.0
[3]  7920.0  8115.5
        ...
[9]  9093 9288.5
[10] 9288.5 9484
Community
  • 1
  • 1
czyscner
  • 13
  • 4

1 Answers1

0

the error means you supply a vector as the first argument (and also the second) to seq instead of a single number. try with bin_it(feature[1,2], feature[1,3], 100) and it should work fine. now to fix this either make a loop (bad)

output = c()
for(l in 1:dim(feature)[1]){
  output = c(output, bin_it(feature[l,2], feature[l,3], 100))
}

or (way better) use the apply family. in your case something like this should do it:

output = apply(feature[,2:3], 1, function(x) bin_it(START = x[,1], END = x[,2], BINS = 100))
mts
  • 2,160
  • 2
  • 24
  • 34
  • I just found a nicer way might be `mapply(bin_it, feature[,2], feature[,3], 100)` – mts Jul 01 '15 at 08:00
  • Wonderful, thank you so much! This worked for me, as far as applying the function to the rows of my data. However, I am getting a different type of error: `Error in data.frame(v1, v2) : arguments imply differing number of rows: 99, 101` Now I just have to find what I'm missing in defining v1 and v2. – czyscner Jul 01 '15 at 11:58