4

I am using tabstat in Stata, and using estpost and esttab to get its output to LaTeX. I have

tabstat

to display statistics by group. For example,

tabstat assets, by(industry) missing statistics(count mean sd p25 p50 p75) 

The question I have is whether there is a way for tabstat (or other Stata commands) to display the output ordered by the value of the mean, so that those categories that have higher means will be on top. By default, Stata displays by alphabetical order of industry when I use tabstat.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
rajvijay
  • 1,641
  • 4
  • 23
  • 28

2 Answers2

6

tabstat does not offer such a hook, but there is an approach to problems like this that is general and quite easy to understand.

You don't provide a reproducible example, so we need one:

. sysuse auto, clear
(1978 Automobile Data)

. gen Make = word(make, 1)

. tab Make if foreign

       Make |      Freq.     Percent        Cum.
------------+-----------------------------------
       Audi |          2        9.09        9.09
        BMW |          1        4.55       13.64
     Datsun |          4       18.18       31.82
       Fiat |          1        4.55       36.36
      Honda |          2        9.09       45.45
      Mazda |          1        4.55       50.00
    Peugeot |          1        4.55       54.55
    Renault |          1        4.55       59.09
     Subaru |          1        4.55       63.64
     Toyota |          3       13.64       77.27
         VW |          4       18.18       95.45
      Volvo |          1        4.55      100.00
------------+-----------------------------------
      Total |         22      100.00

Make here is like your variable industry: it is a string variable, so in tables Stata will tend to show it in alphabetical (alphanumeric) order.

The work-around has several easy steps, some optional.

Calculate a variable on which you want to sort. egen is often useful here.

 . egen mean_mpg = mean(mpg), by(Make)

Map those values to a variable with distinct integer values. As two groups could have the same mean (or other summary statistic), make sure you break ties on the original string variable.

 . egen group = group(mean_mpg Make)

This variable is created to have value 1 for the group with the lowest mean (or other summary statistic), 2 for the next lowest, and so forth. If the opposite order is desired, as in this question, flip the grouping variable around.

 . replace group = -group
 (74 real changes made)

There is a problem with this new variable: the values of the original string variable, here Make, are nowhere to be seen. labmask (to be installed from the Stata Journal website after search labmask) is a helper here. We use the values of the original string variable as the value labels of the new variable. (The idea is that the value labels become the "mask" that the integer variable wears.)

 . labmask group, values(Make)

Optionally, work at the variable label of the new integer variable.

 . label var group "Make"

Now we can tabulate using the categories of the new variable.

 . tabstat mpg if foreign, s(mean) by(group) format(%2.1f)

 Summary for variables: mpg
 by categories of: group (Make)

   group |      mean
 --------+----------
  Subaru |      35.0
   Mazda |      30.0
      VW |      28.5
   Honda |      26.5
 Renault |      26.0
  Datsun |      25.8
     BMW |      25.0
  Toyota |      22.3
    Fiat |      21.0
    Audi |      20.0
   Volvo |      17.0
 Peugeot |      14.0
 --------+----------
   Total |      24.8
 -------------------

Note: other strategies are sometimes better or as good here.

  • If you collapse your data to a new dataset, you can then sort it as you please.

  • graph bar and graph dot are good at displaying summary statistics over groups, and the sort order can be tuned directly.

UPDATE 3 and 5 October 2021 A new helper command myaxis from SSC and the Stata Journal (see [paper here) condenses the example here with tabstat:

* set up data example 
sysuse auto, clear
gen Make = word(make, 1)

* sort order variable and tabulation 
myaxis Make2 = Make, sort(mean mpg) descending 
tabstat mpg if foreign, s(mean) by(Make2) format(%2.1f)
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • Thanks for this answer. This is (not blaming you) frankly insane - are everyday users of Stata doing this kind of thing all the time merely to sort the OUTPUT of a tabulation? Seems like just coping it into Excel and doing it there is safer, faster and less error prone. – Simon_Weaver Oct 02 '21 at 19:53
  • Not for me. I don’t use Excel routinely. The answer focused on the specific question 7 years ago. – Nick Cox Oct 02 '21 at 21:46
  • appreciate the reply. I was just surprised not to find a better answer. I know Stata has a long history, and as the 'programmer friend' I'm trying to help someone with a few things and just ending up very surprised sometimes about simple things I can't do. In SQL I would just add `ORDER BY mean DESC` and that's it! – Simon_Weaver Oct 03 '21 at 07:03
  • I will add an update mentioning a new helper command. – Nick Cox Oct 03 '21 at 07:57
  • `tabstat` is a moderately old command and, sure, anyone using it might want this to be easier. But one more modern approach (ancient, in fact, as `collapse` and `contract` are commands of some vintage) is to produce a table of results as a reduced dataset and then `sort` on whatever you want. The answers in 2014 necessarily did not mention frames (Stata 16) as another approach. – Nick Cox Oct 03 '21 at 09:34
5

I would look at the egenmore package on SSC. You can get that package by typing in Stata ssc install egenmore. In particular, I would look at the entry for axis() in the helpfile of egenmore. That contains an example that does exactly what you want.

Maarten Buis
  • 2,684
  • 12
  • 17
  • 3
    Note that Maarten's answer is really the same as mine, in so far as the main idea behind `axis()` is to sort categories according to the quantitative values of some other variable (and also in so far as I wrote it). My longer answer spells out the entire logic. – Nick Cox Jul 23 '16 at 11:39