0

I have a dataset I wish to process, and instead of processing it as a time series, I want to summarize the time behaviour. Here is the dataset:

business_id                year
vcNAWiLM4dR7D2nwwJ7nCA     2007
vcNAWiLM4dR7D2nwwJ7nCA     2007
vcNAWiLM4dR7D2nwwJ7nCA     2009
UsFtqoBl7naz8AVUBZMjQQ     2004
UsFtqoBl7naz8AVUBZMjQQ     2005
cE27W9VPgO88Qxe4ol6y_g     2007
cE27W9VPgO88Qxe4ol6y_g     2007
cE27W9VPgO88Qxe4ol6y_g     2008
cE27W9VPgO88Qxe4ol6y_g     2010

I want to turn it into this:

business_id                year   yr_id
vcNAWiLM4dR7D2nwwJ7nCA     2007   1
vcNAWiLM4dR7D2nwwJ7nCA     2007   1
vcNAWiLM4dR7D2nwwJ7nCA     2009   2
UsFtqoBl7naz8AVUBZMjQQ     2004   1
UsFtqoBl7naz8AVUBZMjQQ     2005   2
cE27W9VPgO88Qxe4ol6y_g     2007   1
cE27W9VPgO88Qxe4ol6y_g     2007   1
cE27W9VPgO88Qxe4ol6y_g     2008   2
cE27W9VPgO88Qxe4ol6y_g     2010   3

In other words, I want the ID to be sequential to the year, but local to the business_id, so that it resets when the program finds another business_id.

Is this something that is easily achievable in R?

APC
  • 144,005
  • 19
  • 170
  • 281
Jesus Ramos
  • 149
  • 1
  • 8
  • your example does not suit your explanation... do you mean you want an `id` for each couple (`year` x `business_id`) ? Otherwise it seems like you want just an year-identifier... which is the year itself! – Arthur Nov 21 '15 at 02:11
  • Maybe I should explain myself better. If I create an id based on the year only, it will not reset when it encounters a new biz_id, and if I create one by just concatenating biz_id and year, it will go on 'globally' -that is, throughout the entire dataset- across all pairs and will not reset either. – Jesus Ramos Nov 21 '15 at 02:14
  • 2
    Either way, I found this question and answer right after posting. This achieves exactly what I want: http://stackoverflow.com/questions/27895860/r-add-column-that-counts-sequentially-within-groups-but-repeats-for-duplicates – Jesus Ramos Nov 21 '15 at 02:14
  • Should I mark it as a duplicate? Or is this done by the admins? – Jesus Ramos Nov 21 '15 at 02:15
  • you can answer your own question *and* mark it as duplicate – Arthur Nov 21 '15 at 02:18

1 Answers1

1

I found this other question in SO, and the answer effectively answers this question, so this should be marked as duplicate.

https://stackoverflow.com/a/27896841/4858065

The way to achieve this is:

df %>% group_by(business_id) %>% 
    mutate(year_id = dense_rank(year))
Community
  • 1
  • 1
Jesus Ramos
  • 149
  • 1
  • 8
  • 1
    While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/10285475) – Hamid Pourjam Nov 21 '15 at 11:48
  • Edited to include the specific answer to my problem based on the solution presented in the link. – Jesus Ramos Nov 21 '15 at 18:55
  • The appropriate thing to do in this situation is to flag the question as a duplicate. An alternative (once you're at 50 rep) is to leave a comment on the question. – Teepeemm Nov 21 '15 at 20:27
  • I tried that, but that option doesn't show in my UI, so I just reported it to moderators. Maybe it has to do with my low rep. – Jesus Ramos Nov 22 '15 at 01:24
  • @Teepeemm: Hmm...OP can comment on his question without 50 rep... – Remi Guan Nov 22 '15 at 02:05