1

I am trying to replace missing values with values from the same column dependent on their equality from other columns:

I have different firms, from different industries & countries and from different years. Below is just a small example. I would like to replace the missing values (in the column industry or country) with the existing values in the column, if they come from the same firm.

For example, firm 123 is missing its industry in year 2. I have the industry in which the firm belongs to from the previous (or sometimes succeeding) years but do not know how to add it.

Another example: I have the firm 444 which is missing its country in year 3. I do have its country from the previous years but do not know how to transfer / copy it to the 3rd year:

 --firm--year--industry--country

 --123----1-------1---------usa

 --123----2-------1---------usa

 --123----3--------.--------usa

 --333----1-------2---------usa

 --333----2--------.---------usa

 --444---1---------.----------fr

 --444---2---------2---------fr

 --444---3---------2----------.

I looked up on stata/help and on the internet. All I could find was the replace command, but it only replaced equal numbers.

I think it will be something with:

replace industry=(problaby something dependant of the firm (and maybe year))  if industry==.

replace country=(problaby something dependant of the firm (and maybe year))  if country==.

I am not sure for the country replacement, because the observations are not numbers. I think I will need to generate a new variable with numbers for the country-replacement.

Thanks a lot!

Metrics
  • 15,172
  • 7
  • 54
  • 83
Franz
  • 33
  • 1
  • 2
  • 8

2 Answers2

7

Take a look at replacing missing values with neighboring values FAQ and at user-written xfill. The latter is useful for filling in static variables. It replaces missing values in a cluster with the unique non-missing value within that cluster.

dimitriy
  • 9,077
  • 2
  • 25
  • 50
  • Thanks a lot @Dimitriy. I already checked the FAQ from stata yesterday, but the xfill-command helped me perfectly! – Franz Jul 23 '13 at 07:51
0

For this particular example where the industry variable is the same within firm, you could also write

levelsof firm, local(F)
foreach f of local F{
sum industry if firm==`f'
replace industry=r(mean) if firm==`f' & industry==.
}

This code creates a local variable F that is a list of all different firms. It then summarizes the industry for each firm. Since the industry number will always be the same for observations from the same firm, the mean value will just be the industry number. The code then replaces the value of industry to be this number for all observations within that firm (you could omit the "& industry==." part of the code here and it would still work the same way).

  • 1
    Alternatively, you could just go `bysort firm (industry) : replace industry = industry[1] if missing(industry)`. That is, `levelsof` and the loop are both unnecessary. (As you say, there is a tacit assumption here of at most one distinct value for each firm that is used for replacement.) – Nick Cox Jan 18 '16 at 18:54