0

I'm trying to sum up data in Stata. I have UK local authority codes (e.g. E06000047) and a dataset, which is deeper by a level (MSOA).

MSOA code   MSOA name           Local authority code    Net weekly income
E02004297   County Durham 001   E06000047               480.00
E02004290   County Durham 002   E06000047               540.00
E02004298   County Durham 003   E06000047               520.00
E02004299   County Durham 004   E06000047               430.00
E02004291   County Durham 005   E06000047               400.00

Since I'm not interested in the MSOA-level-data i'd like to sum up the data to local auth code level. Where I fail is the fact that I can't calculate with string data. What I want to do is:

foreach identical "Local authority code" take the mean/median and 
store it in a var "means.local-auth"

So what I expect is something like:

Local authority code  means.local-auth  median.local-auth
E06000047             474.00            480.00
E06000048             486.00            485.00
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
arne144
  • 1
  • 1

2 Answers2

2

Since it's not clear from your question if your objective is to produce a report, such as Nick's, or to take the first step toward an analysis at the local authority level, here's code that uses collapse to take your data and reduce it to the local authority level.

. * Example generated by -dataex-. To install: ssc install dataex
. clear

. input str20(msoa_c msoa_n lac) float income

          msoa_c              msoa_n         lac income
  1. "E02004297" "County Durham 001" "E06000047" 480
  2. "E02004290" "County Durham 002" "E06000047" 540
  3. "E02004298" "County Durham 003" "E06000047" 520
  4. "E02004299" "County Durham 004" "E06000047" 430
  5. "E02004291" "County Durham 005" "E06000047" 400
  6. end

. format income %9.2f

. drop msoa_c msoa_n 

. collapse (mean) mean_inc=income (median) med_inc=income, by(lac)

. list

     +--------------------------------+
     |       lac   mean_inc   med_inc |
     |--------------------------------|
  1. | E06000047     474.00    480.00 |
     +--------------------------------+

. 
1

For this kind of simple summarization, no loops are necessary. Here is a reproducible example with egen used to produce variables using by() (its argument can be numeric or string and indeed need not be a single variable). tabdisp can be convenient for simple tabulations.

sysuse auto, clear
egen mean_mpg = mean(mpg), by(rep78) 
egen median_mpg = median(mpg), by(rep78) 

tabdisp rep78, c(mean_mpg median_mpg) 

----------------------------------
Repair    |
Record    |
1978      |   mean_mpg  median_mpg
----------+-----------------------
        1 |         21          21
        2 |     19.125          18
        3 |   19.43333          19
        4 |   21.66667        22.5
        5 |   27.36364          30
        . |       21.4          22
----------------------------------

tabdisp rep78, c( mean_mpg median_mpg) format(%2.1f)

----------------------------------
Repair    |
Record    |
1978      |   mean_mpg  median_mpg
----------+-----------------------
        1 |       21.0        21.0
        2 |       19.1        18.0
        3 |       19.4        19.0
        4 |       21.7        22.5
        5 |       27.4        30.0
        . |       21.4        22.0
----------------------------------
Nick Cox
  • 35,529
  • 6
  • 31
  • 47