Create a local id for a combination of 2 columns

Question

I have a dataset I wish to process, and instead of processing it as a time series, I want to summarize the time behaviour. Here is the dataset:

business_id                year
vcNAWiLM4dR7D2nwwJ7nCA     2007
vcNAWiLM4dR7D2nwwJ7nCA     2007
vcNAWiLM4dR7D2nwwJ7nCA     2009
UsFtqoBl7naz8AVUBZMjQQ     2004
UsFtqoBl7naz8AVUBZMjQQ     2005
cE27W9VPgO88Qxe4ol6y_g     2007
cE27W9VPgO88Qxe4ol6y_g     2007
cE27W9VPgO88Qxe4ol6y_g     2008
cE27W9VPgO88Qxe4ol6y_g     2010

I want to turn it into this:

business_id                year   yr_id
vcNAWiLM4dR7D2nwwJ7nCA     2007   1
vcNAWiLM4dR7D2nwwJ7nCA     2007   1
vcNAWiLM4dR7D2nwwJ7nCA     2009   2
UsFtqoBl7naz8AVUBZMjQQ     2004   1
UsFtqoBl7naz8AVUBZMjQQ     2005   2
cE27W9VPgO88Qxe4ol6y_g     2007   1
cE27W9VPgO88Qxe4ol6y_g     2007   1
cE27W9VPgO88Qxe4ol6y_g     2008   2
cE27W9VPgO88Qxe4ol6y_g     2010   3

In other words, I want the ID to be sequential to the year, but local to the business_id, so that it resets when the program finds another business_id.

Is this something that is easily achievable in R?

your example does not suit your explanation... do you mean you want an `id` for each couple (`year` x `business_id`) ? Otherwise it seems like you want just an year-identifier... which is the year itself! — Arthur, Nov 21 '15 at 02:11
Maybe I should explain myself better. If I create an id based on the year only, it will not reset when it encounters a new biz_id, and if I create one by just concatenating biz_id and year, it will go on 'globally' -that is, throughout the entire dataset- across all pairs and will not reset either. — Jesus Ramos, Nov 21 '15 at 02:14
Either way, I found this question and answer right after posting. This achieves exactly what I want: http://stackoverflow.com/questions/27895860/r-add-column-that-counts-sequentially-within-groups-but-repeats-for-duplicates — Jesus Ramos, Nov 21 '15 at 02:14
Should I mark it as a duplicate? Or is this done by the admins? — Jesus Ramos, Nov 21 '15 at 02:15

score 1 · Answer 1 · edited May 23 '17 at 12:17

1

I found this other question in SO, and the answer effectively answers this question, so this should be marked as duplicate.

https://stackoverflow.com/a/27896841/4858065

The way to achieve this is:

df %>% group_by(business_id) %>% 
    mutate(year_id = dense_rank(year))

edited May 23 '17 at 12:17

Community

1
1

answered Nov 21 '15 at 02:25

Jesus Ramos

149
1
8

1

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/10285475) – Hamid Pourjam Nov 21 '15 at 11:48
Edited to include the specific answer to my problem based on the solution presented in the link. – Jesus Ramos Nov 21 '15 at 18:55
The appropriate thing to do in this situation is to flag the question as a duplicate. An alternative (once you're at 50 rep) is to leave a comment on the question. – Teepeemm Nov 21 '15 at 20:27
I tried that, but that option doesn't show in my UI, so I just reported it to moderators. Maybe it has to do with my low rep. – Jesus Ramos Nov 22 '15 at 01:24
@Teepeemm: Hmm...OP can comment on his question without 50 rep... – Remi Guan Nov 22 '15 at 02:05

Create a local id for a combination of 2 columns

1 Answers1