0

I have a panel identifier that is a firm's name stored as an alphanumeric code and I would like to track name changes throughout the panel. For example, firm A is renamed to B on on May 25, 2001. Then firm B is renamed to C on May 25, 2003. Then firm C is renamed to D on May 25, 2005. In this example firms D, E, and G are not renamed.

I want a "transition matrix" so that I can see if firm A is involved in some transaction at a later date under a new name. For example, in the first row of the test data firms A and G are involved in a transaction announced on Aug 12, 2001. The AA variable is determined long in the past, so by the time that Aug 12, 2001 rolls around AA's new firm name is B (change occurred on May 25, 2001). In the second row of the test data B and H are involved in a transaction also announced on Aug 12, 2001. But AA has not had time to change again so the firm's name is still B.

The code below tracks these name changes, but I am suspicious that this approach is too mechanical and not robust to scenarios that I haven't yet considered.

Is there a more logical approach that I should follow or a command that handles this re-coding?

* data on re-coding
clear
input str1 OldCode str1 NewCode str8 Date
A B 20010525
B C 20030525
C D 20050525
D   
E   
F G 20010525
G   
end 

generate temp = date(Date, "YMD")
drop Date
rename temp Date
format Date %td

* save to temp file
preserve
tempfile Codes
save "`Codes'"
restore

* merge back (recursively) to generate wide data that track re-coding
local i = 0
local j = 1

while (`j' != 0) {
    local ++i
    rename OldCode OldCode`i'
    rename NewCode NewCode`i'
    rename Date Date`i'

    cross using "`Codes'"
    count if (NewCode`i' == OldCode)
    local j = `r(N)' // zero when all re-codings accounted for
    keep if (NewCode`i' == OldCode) | missing(NewCode`i')
    replace OldCode = "" if missing(NewCode`i')
    replace NewCode = "" if missing(NewCode`i')
    replace Date = . if missing(NewCode`i')

    duplicates drop
}

* last addition is unnecessary 
drop OldCode NewCode Date

* rename and sort, only need first code
rename OldCode1 temp
drop OldCode*
rename temp OldCode
sort OldCode

* reshape to long
reshape long NewCode Date, i(OldCode)
drop _j
duplicates drop
compress
save "Codes", replace

* test data
clear
input str1 AA str1 TA str8 Date
A G 20010812
B H 20010812
C D 20050812
end
generate DealNumber = _n

generate temp = date(Date, "YMD")
drop Date
rename temp DateAnnounced
format DateAnnounced %td

clonevar OldCode = AA
joinby OldCode using "Codes.dta", unmatched(master)
drop _merge

sort DealNumber Date 
keep if (DateAnnounced >= Date) | missing(Date)
bysort DealNumber (Date) : keep if (_n == 1)
Richard Herron
  • 9,760
  • 12
  • 69
  • 116

1 Answers1

1

You can create a firm identifier that groups all observations of the same firm under different names. This is pretty straightforward to do with group_id (from SSC). Something like:

* data on re-coding
clear
input str1 OldCode str1 NewCode str8 Date
A B 20010525
B C 20030525
C D 20050525
D   
E   
F G 20010525
G
J C 20011525
end 

* convert dyads to long form
gen dyad_id = _n
expand 2 if !mi(NewCode)
bysort dyad_id: gen code = cond(_n == 1, OldCode, NewCode)

* use -group_id- (from SSC) to further group dyad_id when code is the same
clonevar jointcode = dyad_id
group_id jointcode, match(code)

* revert to the original wide form
drop code
bysort jointcode dyad_id: keep if _n == 1
list, sepby(jointcode)
Robert Picard
  • 1,051
  • 6
  • 9