1

I am sure there exist a very simple solution to my question. In that case I am very sorry. I have been trying to search for a similar question, but in vain. I wish to create an "ID" variable, ID3, that takes on a new value whenever ID2_EVWIND takes on a new value. A simplified version of my data frame looks like this,

##1 Date    ID2_EVWIND  ID3
#2  8/2/02  35          1
#3  28/2/02 35          1
#4  28/2/02 35          1
#5  2/2/02  36          2
#6  13/2/02 36          2
#7  11/2/02 36          2
#8  8/2/02  36          2
#9  8/2/02  36          2
#10 20/2/02 25          3
#11 10/2/02 25          3
#12 21/2/02 33          4
#13 4/2/02  33          4
#14 16/2/02 33          4
#15 15/2/02 33          4
#16 16/2/02 33          4
#17 23/2/02 29          5
#18 3/2/02  30          6
#19 11/2/02 30          6
#20 26/2/02 30          6
#21 26/2/02 30          6
#22 6/2/02  18          7
#23 28/2/02 18          7
#24 6/2/02  18          7
#25 13/2/02 40          8
#26 7/2/02  40          8
#27 15/2/02 40          8
#28 17/2/02 40          8
#29 16/2/02 40          8
#30 27/2/02 24          9
#31 8/2/02  24          9
#32 3/2/02  11          10
#33 2/2/02  11          10
#34 5/2/02  11          10
#35 4/2/02  12          11

I hereby provide an reproducible example in R containing the variable, ID3, I wish to create

structure(list(Date = structure(c(1013126400, 1014854400, 1014854400, 
1012608000, 1013558400, 1013385600, 1013126400, 1013126400, 1014163200, 
1013299200, 1014249600, 1012780800, 1013817600, 1013731200, 1013817600, 
1014422400, 1012694400, 1013385600, 1014681600, 1014681600, 1012953600, 
1014854400, 1012953600, 1013558400, 1013040000, 1013731200, 1013904000, 
1013817600, 1014768000, 1013126400, 1012694400, 1012608000, 1012867200, 
1012780800), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    ID2_EVWIND = c(35, 35, 35, 36, 36, 36, 36, 36, 25, 25, 33, 
    33, 33, 33, 33, 29, 30, 30, 30, 30, 18, 18, 18, 40, 40, 40, 
    40, 40, 24, 24, 11, 11, 11, 12)), row.names = c(NA, -34L), class = c("tbl_df", 
"tbl", "data.frame"))

Thank you in advance (!)

Cec SK
  • 59
  • 5

3 Answers3

1

You can use data.table rleid :

data.table::rleid(df$ID2_EVWIND)
#[1]  1  1  1  2  2  2  2  2  3  3  4  4  4  4  4  5  6  6  6  6  7  7  7  8  8  8  8  8  9  9 10 10 10 11

Another option is match :

match(df$ID2_EVWIND, unique(df$ID2_EVWIND))

Although in this case both of them give the expected output but their behavior is different when the numbers repeat. Take this example :

x <- c(1, 1, 2, 3, 3, 1, 1)
data.table::rleid(x)
#[1] 1 1 2 3 3 4 4

match(x, unique(x))
#[1] 1 1 2 3 3 1 1

You can select the option based on your requirement.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

And an approach using dplyr::lag:

library(dplyr)

d %>% 
  mutate(ID3 = cumsum(ID2_EVWIND != lag(ID2_EVWIND, default = 0)))
#> # A tibble: 34 x 3
#>    Date                ID2_EVWIND   ID3
#>    <dttm>                   <dbl> <int>
#>  1 2002-02-08 00:00:00         35     1
#>  2 2002-02-28 00:00:00         35     1
#>  3 2002-02-28 00:00:00         35     1
#>  4 2002-02-02 00:00:00         36     2
#>  5 2002-02-13 00:00:00         36     2
#>  6 2002-02-11 00:00:00         36     2
#>  7 2002-02-08 00:00:00         36     2
#>  8 2002-02-08 00:00:00         36     2
#>  9 2002-02-20 00:00:00         25     3
#> 10 2002-02-10 00:00:00         25     3
#> # ... with 24 more rows
stefan
  • 90,330
  • 6
  • 25
  • 51
0

Using diff...

With dplyr:

library(dplyr)
df %>% 
  mutate(ID3 = cumsum(c(1,  abs(diff(ID2_EVWIND)) > 0))) %>% 
  head()
#> # A tibble: 6 x 3
#>   Date                ID2_EVWIND   ID3
#>   <dttm>                   <dbl> <dbl>
#> 1 2002-02-08 00:00:00         35     1
#> 2 2002-02-28 00:00:00         35     1
#> 3 2002-02-28 00:00:00         35     1
#> 4 2002-02-02 00:00:00         36     2
#> 5 2002-02-13 00:00:00         36     2
#> 6 2002-02-11 00:00:00         36     2

And the base r version:

df$ID3 <-  cumsum(c(1,  abs(diff(df$ID2_EVWIND)) > 0))
head(df)         
#> # A tibble: 6 x 3
#>   Date                ID2_EVWIND   ID3
#>   <dttm>                   <dbl> <dbl>
#> 1 2002-02-08 00:00:00         35     1
#> 2 2002-02-28 00:00:00         35     1
#> 3 2002-02-28 00:00:00         35     1
#> 4 2002-02-02 00:00:00         36     2
#> 5 2002-02-13 00:00:00         36     2
#> 6 2002-02-11 00:00:00         36     2

Created on 2020-07-11 by the reprex package (v0.3.0)

Peter
  • 11,500
  • 5
  • 21
  • 31