1

I have a dataset which looks like this:

fact_code style_serial
1004      style_101
1004      style_101
1004      style_101
1004      style_102
1004      style_102
1004      style_102
5002      style_101
5002      style_101
5002      style_101
5002      style_102
5002      style_102
5002      style_102

where fact_code is the factory code, and style_serial is the serial number of the garment style that the factory produces. What I am trying to generate, is a variable, ss, that looks like this:

fact_code style_serial ss
1004      style_101    1
1004      style_101    0
1004      style_101    0
1004      style_102    1
1004      style_102    0
1004      style_102    0
5002      style_101    1
5002      style_101    0
5002      style_101    0
5002      style_102    1
5002      style_102    0
5002      style_102    0

Basically, this variable, ss, can be generated by the Stata code as follows:

bysort fact_code style_serial: gen ss=_n==1

_n is Stata notation for the current observation number.

But I am trying to generate the same dummy variable, ss, using R but keep getting errors. These are the R codes that I have tried to mimic the above Stata code:

mydf <- mydf %>% 
  group_by(fact_code, style_serial) %>% 
  mutate(ss = n_distinct(fact_code, style_serial))

and

mydf <- mydf %>% group_by(fact_code, style_serial) %>% 
  mutate(ss =  ave(mydf$fact_code, mydf$style_serial, FUN = seq_along))

The name of the R dataframe is mydf.

Any help would be appreciated.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
user3571389
  • 335
  • 1
  • 5
  • 10

1 Answers1

1

You could use duplicated().

df1$ss <- with(df1, ifelse(duplicated(df1), 0, 1))

Yields

> df1
   fact_code style_serial ss
1       1004    style_101  1
2       1004    style_101  0
3       1004    style_101  0
4       1004    style_102  1
5       1004    style_102  0
6       1004    style_102  0
7       5002    style_101  1
8       5002    style_101  0
9       5002    style_101  0
10      5002    style_102  1
11      5002    style_102  0
12      5002    style_102  0

Data

df1 <- read.table(header=TRUE, text="fact_code style_serial
1004      style_101
                  1004      style_101
                  1004      style_101
                  1004      style_102
                  1004      style_102
                  1004      style_102
                  5002      style_101
                  5002      style_101
                  5002      style_101
                  5002      style_102
                  5002      style_102
                  5002      style_102")
jay.sf
  • 60,139
  • 8
  • 53
  • 110