Grouping consecutive integers in r and performing analysis on groups

Question

I have a data frame, with which I would like to group the intervals based on whether the integer values are consecutive or not and then find the difference between the maximum and minimum value of each group.

Example of data:

 x        Integers
 0.1      14
 0.05     15
 2.7      17
 0.07     19
 3.4      20
 0.05     21

So Group 1 would consist of 14 and 15 and Group 2 would consist of 19,20 and 21. The difference of each group then being 1 and 2, respectively.

I have tried the following, to first group the consecutive values, with no luck.

 Breaks <- c(0, which(diff(Data$Integer) != 1), length(Data$Integer)) 

sapply(seq(length(Breaks) - 1), 
     function(i) Data$Integer[(Breaks[i] + 1):Breaks[i+1]])

This might be a good place to start: http://stackoverflow.com/q/31569550/1191259 — Frank, Jul 23 '15 at 17:15
I have tried the solution in the link but it returns the following error for me: Error in 1:which(values == 1 & lengths == dur) : argument of length 0 In addition: Warning message: In max(lengths[values == 1]) : no non-missing arguments to max; returning -Inf — Student, Jul 23 '15 at 17:21
Yeah, you can't follow that solution precisely, but I think it may be helpful, as it's a very similar problem. (I'm not saying that it's exactly the same.) — Frank, Jul 23 '15 at 17:22
This is a good start http://stackoverflow.com/questions/31462438/determine-when-a-sequence-of-numbers-has-been-broken-in-r/31463774#31463774 — Matias Andina, Jul 23 '15 at 17:24
I would use the function to determine the position of the brakes an then you can set the levels of the grouping variable to meet those brakes. Also, your code is right on the way but I don't think you can solve that in that few lines, I think your coding ability is enough to solve this problem — Matias Andina, Jul 23 '15 at 17:26

score 3 · Answer 1 · answered Jul 23 '15 at 17:57

3

Here's a solution using by():

df <- data.frame(x=c(0.1,0.05,2.7,0.07,3.4,0.05),Integers=c(14,15,17,19,20,21));
do.call(rbind,by(df,cumsum(c(0,diff(df$Integers)!=1)),function(g) data.frame(imin=min(g$Integers),imax=max(g$Integers),irange=diff(range(g$Integers)),xmin=min(g$x),xmax=max(g$x),xrange=diff(range(g$x)))));
##   imin imax irange xmin xmax xrange
## 0   14   15      1 0.05  0.1   0.05
## 1   17   17      0 2.70  2.7   0.00
## 2   19   21      2 0.05  3.4   3.35

I wasn't sure what data you wanted in the output, so I just included everything you might want.

You can filter out the middle group with subset(...,irange!=0).

answered Jul 23 '15 at 17:57

bgoldst

34,190
6
38
64

I have developed some ugly code that eventually does what I want, but the neatness of your code is appealing. However, when I try to apply it to my data, the min and max values which are produced are equal; making the range equivalent to zero for all values. Hopefully I can solve the problem but your code is potentially the most efficient I have come across. – Student Jul 23 '15 at 20:19
The problem seems to be that with my full data set, when I pass the argument diff(df$Integers)!=1, the output is TRUE for all outcomes. Where as in the sample data set the output is FALSE TRUE TRUE FALSE FALSE as expected. The format is numeric for both the full, and sample, data frame so I can't see any obvious reason why my full data set is misbehaving. – Student Jul 23 '15 at 21:27

Grouping consecutive integers in r and performing analysis on groups

1 Answers1