1

I have obtained a list of projects that in total generate zero revenue (total revenue over a period of time)

tabstat revenue, by(project) stat(sum)

I have identified 261 projects (out of 1000s) that generate zero revenue for the whole period of time.

Now, want to look at the total value of a specific variable that can be tracked over multiple periods for each project in these zero-revenue-generating projects. I know that I can go after each campaign by typing

tabstat variable_of_interest if project==127, stat(sum)

Again, here project 127 generated zero revenue.

Is there a way to merge these two codes so that I can generate a table with the following logic

generate total sum of the variable_of_interest if project's stat(sum) was equal to zero?

here is a data sample

project revenue var_of_intr
1          0        5
1          0        8
1          2        10
1          0        5
2          0        5
2          0        90
2          0        2
2          0        0
3          0        76
3          0        5
3          0        23
3          0        4
4          0        75
4          8        2
4          0        9
4          0        6
5          0        88
5          0        20
5          0        9
5          0        14

Since projects 1 and 4 generated revenue>0, the code should ignore then when summing up the variable of interest by campaign, thus, the table I am interested in should look like this

project var_of_intr
2        97
3        108
5        131
Roberto Ferrer
  • 11,024
  • 1
  • 21
  • 23
Olga
  • 289
  • 2
  • 6
  • 16

1 Answers1

2

You can use collapse:

clear
set more off

*----- example data -----

input ///
project revenue somevar
1          0        5
1          0        8
1          2        10
1          0        5
2          0        5
2          0        90
2          0        2
2          0        0
3          0        76
3          0        5
3          0        23
3          0        4
4          0        75
4          8        2
4          0        9
4          0        6
5          0        88
5          0        20
5          0        9
5          0        14
end

list

*----- what you want -----

collapse (sum) revenue somevar, by(project)
keep if revenue == 0

That will destroy the database, of course, but it might be useful anyway. You don't really specify if this approach is acceptable or not.

For a table, you can flag projects with revenue equal to zero and condition on that:

bysort project (revenue): gen revzero = revenue[_N] == 0

tabstat somevar if revzero, by(project) stat(sum)

If you have missing or negative revenues, modifications are required.

Roberto Ferrer
  • 11,024
  • 1
  • 21
  • 23
  • That's a good alternative when no other options are available. My data set is 7M+ obs. Loading it back again after the collapse command destroys the data set is a little bit time consuming. It will be good to know if there is a command that can just generate a table. Thank you Roberto! – Olga Nov 08 '14 at 14:52
  • Oh - my bad, I accidentally skipped it. – Olga Nov 08 '14 at 14:54
  • Glad it worked. You can upvote and accept the answer if you wish. This will boost your reputation too. – Roberto Ferrer Nov 08 '14 at 15:40