I have a dataset which looks like this:
fact_code style_serial
1004 style_101
1004 style_101
1004 style_101
1004 style_102
1004 style_102
1004 style_102
5002 style_101
5002 style_101
5002 style_101
5002 style_102
5002 style_102
5002 style_102
where fact_code
is the factory code, and style_serial
is the serial number of the garment style that the factory produces. What I am trying to generate, is a variable, ss
, that looks like this:
fact_code style_serial ss
1004 style_101 1
1004 style_101 0
1004 style_101 0
1004 style_102 1
1004 style_102 0
1004 style_102 0
5002 style_101 1
5002 style_101 0
5002 style_101 0
5002 style_102 1
5002 style_102 0
5002 style_102 0
Basically, this variable, ss
, can be generated by the Stata code as follows:
bysort fact_code style_serial: gen ss=_n==1
_n
is Stata notation for the current observation number.
But I am trying to generate the same dummy variable, ss
, using R but keep getting errors. These are the R codes that I have tried to mimic the above Stata code:
mydf <- mydf %>%
group_by(fact_code, style_serial) %>%
mutate(ss = n_distinct(fact_code, style_serial))
and
mydf <- mydf %>% group_by(fact_code, style_serial) %>%
mutate(ss = ave(mydf$fact_code, mydf$style_serial, FUN = seq_along))
The name of the R dataframe is mydf
.
Any help would be appreciated.