0

I have a simple question about the distinct command in Stata.

When using with a by prefix, can it return a one dimension matrix of r(N)?

For example:

sysuse auto,clear
bysort foreign: distinct rep78

Can I store a [2,1] matrix, with each row representing the number of distinct values of rep78?

The manual seems to suggest that it only stores the number of distinct values of the last by value.

Yan Song
  • 2,285
  • 4
  • 18
  • 27

2 Answers2

4

You can easily create your own wrapper for that:

sysuse auto,clear

sort foreign                
levelsof foreign, local(foreign_levels)
local number_of_foreign_levels : word count `foreign_levels'

matrix distinct_mat = J(`number_of_foreign_levels', 1, 0)

forvalues i = 1 / `number_of_foreign_levels' {
     quietly distinct rep78 if foreign == `i' - 1
     matrix distinct_mat[`i', 1] = r(ndistinct)
}

matrix list distinct_mat

distinct_mat[2,1]
    c1
r1   5
r2   3

Note that the number of distinct observations is stored in r(ndistinct), not r(N).

  • Thanks for the solution. I thought about this and thought there would be an easier solution. – Yan Song Jun 04 '18 at 14:42
  • 2
    @YanSong if you have to do this repeatedly for different variables you can include the above into a program.You will then only have to write a single line. –  Jun 04 '18 at 14:45
2

Here is another way to get numbers of distinct values into a matrix.

. sysuse auto
(1978 Automobile Data)

. egen tag = tag(foreign rep78)

. tab foreign if tag, matcell(foo)

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |          5       62.50       62.50
    Foreign |          3       37.50      100.00
------------+-----------------------------------
      Total |          8      100.00
Nick Cox
  • 35,529
  • 6
  • 31
  • 47