I'm working on a big data set of corporate account data in order to solve a classification problem if a firm goes bankrupt or not.
The dataset contains a variable liquid
which states the year when the liquidation started. This variable is omnipresent in every year of observation given that the firm actually starts liquidation. Otherwise it is zero. Usually, liquid
is larger than the last year of observation. So, there are no observations of the corporate data in the year the firm starts liquidation. Sometimes, there are even longer gaps. For example, a firm starts liquidation in 2005 but the last observation of the financial ratios is in 2002.
A sample of the data might look like this:
Now, I want to create a new dummy called bankruptcy
. This should take the value of 1, if it is the last observation (with financial data) of a company that starts liquidation. You can see how bankruptcy
should look like in the table above. How do I proceed?