0

Is it possible to add an exception to summarize(count = n_distinct(x)) in R, while allowing the exception to be counted by the "normal" summarize(count = n()) function?

How do you combine the count n() and n_distinct() functions to create a single new column?

This way, I could summarize the distinct count of observations in column x, while adding an exception in the form of an observation, which would not be limited to a distinct count, but rather be subject to the "normal" summarize(count = n()) function.

For example, if x = c(1, 2, 2, 4, 5, 8, 8, ..., 99), I could summarize the distinct counts of all observations except, say, the observation 8 in column x. The observation 8 would instead be subject to the summarize(count = n()) function. This would then count the number of 8's plus the number of other unique values in x.

In conclusion, this would create a single new column "count", in which all values would be from the distinct count, except for the one exception, whose value would come from the "normal" count.

Will M
  • 692
  • 9
  • 20
  • Hey! Thanks a lot! @d.b I just realised that I was too ambiguous. I meant how to exclude an observation from the distinct count, but include it in the "normal" count. So the new count column would include both functions, but the exception row's count value would be of a "normal" count, while every other count value would be from the distinct count. I'll edit my question above. – Will M Sep 26 '19 at 22:23
  • 2
    I don't know that you should be including two different kinds of count in the one column. I'd do something like `mydata %>% group_by(x) %>% summarise(Distinct = n_distinct(x), Count = n())` and deal with the special cases later by _e.g._ filtering, mutate or labeling them. – neilfws Sep 26 '19 at 22:36
  • @neilfws, thanks a lot! That also works for my problem. Have a good evening! – Will M Sep 26 '19 at 22:43

1 Answers1

1

An update for future readers:

If you want to combine both the distinct count and the "normal" count function, this will distinctly count all observations in x, except for observation 8, which will be subject to the "normal" count:

summarize(count = n_distinct(x[x != 8]) + sum(x == 8))

This would then count the number of 8's plus the number of other unique values in x.

However, if you instead want to use the distinct count function, while adding an exception (e.g. 8), which shouldn't be counted at all, write this:

n_distinct(x[x != 8])

Or this

... %>% filter(x != 8) %>% summarize... 
Will M
  • 692
  • 9
  • 20