I have a dataset I wish to process, and instead of processing it as a time series, I want to summarize the time behaviour. Here is the dataset:
business_id year
vcNAWiLM4dR7D2nwwJ7nCA 2007
vcNAWiLM4dR7D2nwwJ7nCA 2007
vcNAWiLM4dR7D2nwwJ7nCA 2009
UsFtqoBl7naz8AVUBZMjQQ 2004
UsFtqoBl7naz8AVUBZMjQQ 2005
cE27W9VPgO88Qxe4ol6y_g 2007
cE27W9VPgO88Qxe4ol6y_g 2007
cE27W9VPgO88Qxe4ol6y_g 2008
cE27W9VPgO88Qxe4ol6y_g 2010
I want to turn it into this:
business_id year yr_id
vcNAWiLM4dR7D2nwwJ7nCA 2007 1
vcNAWiLM4dR7D2nwwJ7nCA 2007 1
vcNAWiLM4dR7D2nwwJ7nCA 2009 2
UsFtqoBl7naz8AVUBZMjQQ 2004 1
UsFtqoBl7naz8AVUBZMjQQ 2005 2
cE27W9VPgO88Qxe4ol6y_g 2007 1
cE27W9VPgO88Qxe4ol6y_g 2007 1
cE27W9VPgO88Qxe4ol6y_g 2008 2
cE27W9VPgO88Qxe4ol6y_g 2010 3
In other words, I want the ID to be sequential to the year, but local to the business_id
, so that it resets when the program finds another business_id
.
Is this something that is easily achievable in R?