Count of how many times a unique value appears across all columns and rows of a dataset?

Question

I have unique IDs in rows where columns are the IDs of their 'sent' friends. To get a count of 'received' friends, I need to get a count for how many times an ID appears across all columns and rows of the dataset. This is easy in R, but I'd like to stay with Stata for this project.

ID	F1_ID	F2_ID	F3_ID	ID_mentions
1	2	3	4	4
2	4	1		4
3	1	2		3
4	2	1	3	3

Toy data above. Here, there are four mentions of ID #1, three mentions of ID #4, etc.

I want to generate a variable containing the count of how many times each ID value in the first column is mentioned in any column of the data set. This is illustrated in the ID_mentions column.

Nick Cox · Accepted Answer · 2021-09-11T16:04:00.410

1

Turns out I wrote something in this territory. You would need to install this with ssc install tab_chi

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id f1_id f2_id f3_id)
1 2 3 4
2 4 1 .
3 1 2 .
4 2 1 3
end

tabm *id

           |                   values
  variable |         1          2          3          4 |     Total
-----------+--------------------------------------------+----------
        ID |         1          1          1          1 |         4 
     F1_ID |         1          2          0          1 |         4 
     F2_ID |         2          1          1          0 |         4 
     F3_ID |         0          0          1          1 |         2 
-----------+--------------------------------------------+----------
     Total |         4          4          3          3 |        14

EDIT To count all mentions:

gen mentions = . 

quietly forval i = 1/`=_N' { 
    egen work = anycount(*id), value(`=id[`i']')
    su work, meanonly 
    replace mentions = r(sum) in `i'
    drop work 
}

list

edited Sep 11 '21 at 16:04

answered Sep 10 '21 at 17:36

Nick Cox

35,529
6
31
47

Thanks Nick, this package and the help files are informative. Would you know how to extract counts as variable values? E.g., where a new variable column row value is the count of how many times the first "ID" column row value is mentioned in the data set? – Marcus Sep 10 '21 at 18:12
See the help for `tabm`: you can `replace` the dataset in memory. Otherwise see `egen` for various `any` functions for counting across variables. – Nick Cox Sep 11 '21 at 09:34
Thanks again Nick. I've queried these files and still cannot find an answer. I've opted to move to R as a result. – Marcus Sep 11 '21 at 15:26
I don't really follow what new variable you want. Is it to be aligned with the original data, or in a new dataset. The description "new variable" is too vague to allow me to guess. Can you update your question with a concrete example? – Nick Cox Sep 11 '21 at 15:39
Hi Nick. I've edited the original post to add a fourth column illustrating what I'm looking for. Thanks! – Marcus Sep 11 '21 at 15:50
OK. So please look at the EDIT. – Nick Cox Sep 11 '21 at 16:04

Count of how many times a unique value appears across all columns and rows of a dataset?

1 Answers1