I have a data.frame which contain 3 columns named start
, end
and width
. Each line represent a segment over a 1D space with a start, and end and a width such as the "width = end - start + 1"
Here is an example
d = data.frame(
start = c(12, 50, 100, 130, 190),
end = c(16, 80, 102, 142, 201)
)
d$width = d$end - d$start + 1
print(d)
start end width
1 12 16 5
2 50 80 31
3 100 102 3
4 130 142 13
5 190 201 12
Consider two breakpoints and a factor of division
UpperPos = 112
LowerPos = 61
factor = 2
I would like to reduce the width of each segment outside the two breakpoints so that to reduce their width by a factor of factor
. If a segment overlaps a breakpoint, then only the part of the segment that is outside this breakpoint should be reduced in width. In addition, the width of each segment must be a multiple of 3 and must be of non-zero length.
Here is my current function that "squeeze" the segments
squeeze = function(d, factor, LowerPos, UpperPos)
{
for (row in 1:nrow(d))
{
if (d[row,]$end <= LowerPos | d[row,]$end >= UpperPos) # Complete squeeze
{
middlePos = round(d[row,]$start + d[row,]$width/2)
d[row,]$width = round(d[row,]$width / factor)
d[row,]$width = d[row,]$width - d[row,]$width %% 3 + 3
d[row,]$start = round(middlePos - d[row,]$width/2)
d[row,]$end = d[row,]$start + d[row,]$width -1
} else if (d[row,]$start <= LowerPos & d[row,]$end >= LowerPos) # Partial squeeze (Lower)
{
d[row,]$start = round(LowerPos - (LowerPos - d[row,]$start)/factor)
d[row,]$width = d[row,]$end - d[row,]$start + 1
if (d[row,]$width %% 3 != 0)
{
add = 3 - d[row,]$width %% 3
d[row,]$width = d[row,]$width + add
d[row,]$start = d[row,]$start - add
}
} else if (d[row,]$start >= UpperPos & d[row,]$end <= UpperPos) # Partial squeeze (Upper)
{
d[row,]$end = round(UpperPos + (d[row,]$end - UpperPos)/factor)
d[row,]$width = d[row,]$end - d[row,]$start + 1
if (d[row,]$width %% 3 != 0)
{
add = 3 - d[row,]$width %% 3
d[row,]$width = d[row,]$width + add
d[row,]$end = d[row,]$start + add
}
} else if (!(d[row,]$end < UpperPos & d[row,]$start > LowerPos) )
{
print(d)
print(paste("row is ",row))
print(paste("LowerPos is ",LowerPos))
print(paste("UpperPos is ",UpperPos))
stop("In MyRanges_squeeze: Should not run this line!")
}
}
return(d)
}
and it returns the expected output
squeeze(d)
start end width
1 12 14 3
2 54 80 27
3 100 102 3
4 132 140 9
5 192 200 9
However, my function squeeze
is way too slow. Can you help me to improve it?