0

This is probably a very simple question, but I cannot figure it out. I have data where each row is a unique outcome for a person. The people in the dataset all have a unique identifer (Unique ID).

I want to create a second (not unique) identifier (Date ID) in a new column that is the first date that person appears in the data. I would like to do this in Stata, and if it is impossible I guess Excel would be alright.

For example I have:

Unique ID       Date    Outcome
   1          1/1/2000     A
   2          2/11/2003    B
   2          2/12/2003    B
   2          3/6/2004     A
   3          4/4/1997     A
   3          8/11/2011    A
   4          9/16/1995    B
   4          9/17/1995    B
   4          9/18/1995    C
   4          9/19/1995    E
   4          9/20/1995    A
   4          2/17/1996    A
   4          9/16/1996    A
   4          9/17/1996    B
   5          10/16/1995   C
   5          10/17/2005   C
   5          12/31/2008   A

I want:

Unique ID       Date    Outcome           Date ID
   1          1/1/2000     A             1/1/2000
   2          2/11/2003    B             2/11/2003
   2          2/12/2003    B             2/11/2003
   2          3/6/2004     A             2/11/2003
   3          4/4/1997     A             4/4/1997
   3          8/11/2011    A             4/4/1997
   4          9/16/1995    B             9/16/1995
   4          9/17/1995    B             9/16/1995
   4          9/18/1995    C             9/16/1995
   4          9/19/1995    E             9/16/1995
   4          9/20/1995    A             9/16/1995
   4          2/17/1996    A             9/16/1995
   4          9/16/1996    A             9/16/1995
   4          9/17/1996    B             9/16/1995
   5          10/16/1995   C             10/16/1995
   5          10/17/2005   C             10/16/1995
   5          12/31/2008   A             10/16/1995
CJ12
  • 487
  • 2
  • 10
  • 28
  • but in case if two users have same date it will not work just add something unique.. – Garry Dec 01 '12 at 17:48
  • @Garry the use of the word Unique for the second time is inaccurate, duplicates here are OK. – CJ12 Dec 01 '12 at 17:52

2 Answers2

2

A more direct one-line solution is

bysort unique_id (date) : gen date_id = date[1] 

Note that replace is a command, not a function. search by to get a reference to a tutorial on by:.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • See also Stata FAQ http://www.stata.com/support/faqs/data-management/generating-last-date/ (which is relevant, despite title) – Nick Cox Dec 02 '12 at 10:38
0

This may not be the most efficient way but I figured it out:

First I created a counter, that counted the amount of times each individual was in the data

sort unique_id
by unique_id: gen individual_counter = _n

Then I used this counter to make a new variable that was the first date that an individual was in the data, or a missing value '.'.

gen date_id = date if individual_counter == 1

Lastly, I used the replace function to replace missing values from above (i.e., the first day listed)

replace date_id = date_id[_n-1] if date_id >= . 
CJ12
  • 487
  • 2
  • 10
  • 28