I am absolutely brand new to coding in R - in fact coding in general, so excuse my ignorance.
I have a data file with 'start' and 'end' position values for features of varying lengths. I would like to output a file that creates bins for each feature (row of data) by percentage through the length of the feature (1 - 100%).
I think this essentially answers the question, but I'm still having issues: R : Create specific bin based on data range
bin_it <- function(START, END, BINS) {
range <- END-START
jump <- range/BINS
v1 <- c(START, seq(START+jump+1, END, jump))
v2 <- seq(START+jump-1, END, jump)+1
data.frame(v1, v2)
}
My specific data looks like this:
feature <- data.frame(chrom, start, end, feature_name, value, strand)
chr2L 7529 9484 CG11023 1 +
chr2L 21952 24237 CR43609 1 +
chr2L 65999 66242 CR45339 1 +
Using the code above, I have tried:
bin_it <- function(START, END, BINS) {
range <- START-END
jump <- range/BINS
v1 <- c(START, seq(START+jump, END, jump))
v2 <- seq(START+jump, END, jump)
data.frame(v1, v2)
}
bin_it(feature[,2], feature[,3], 100)
I get this error message:
Error in seq.default(START + jump + 1, END, jump) :
'from' must be of length 1
Any suggestions on how to fix this?
Update:
As an example from the first row of the data set above:
START = 7529, END = 9484, BINS = 10 (to simplify), range = 1955, jump = 195.5
Desired output would be:
v1 v2
[1] 7529.0 7724.5
[2] 7724.5 7920.0
[3] 7920.0 8115.5
...
[9] 9093 9288.5
[10] 9288.5 9484