I have a data set which, when plotted, produces a graph that looks like this:
The head of this data is:
> head(data_frame)
score position
73860 10 43000
73859 10 43001
73858 10 43002
73857 10 43003
73856 10 43004
73855 10 43005
I've uploaded the whole file as a tab delimited text file here.
As you can see, the plot has regions which have a score of around 10, but there's one region in the middle that "dips". I would like to identify these dips.
Defining a dip as:
- Starting when the score is below 7
- Ending when the score rises to 7 or above and stays at 7 or above for at least 500 positions
I would like to identify all the regions which meet the above definition, and output their start and end positions. In this case that would only be the one region.
However, I'm at a bit of a loss as to how to do this. Looks like the rle()
function could be useful, but I'm not too sure how to implement it.
Expected output for the data frame would be something like:
[1] 44561 46568
(I haven't actually checked that everything in between these would qualify under the definition, but from the plot this looks about right)
I would be very grateful for any suggestions!
Andrei