Counting observations by group using collapse R package

Question

I want to translate the following R code from tidyverse to collapse. The following code count observations by group and append as a column to the data.frame.

library(tidyverse)
library(collapse)
head(wlddev)

wlddev %>% 
  group_by(income) %>% 
  add_count(., name = "Size") %>% 
  select(country, income, Size) %>% 
  distinct()
# A tibble: 216 x 3
# Groups:   income [4]
   country             income               Size
   <chr>               <fct>               <int>
 1 Afghanistan         Low income           1830
 2 Albania             Upper middle income  3660
 3 Algeria             Upper middle income  3660
 4 American Samoa      Upper middle income  3660
 5 Andorra             High income          4819
 6 Angola              Lower middle income  2867
 7 Antigua and Barbuda High income          4819
 8 Argentina           Upper middle income  3660
 9 Armenia             Upper middle income  3660
10 Aruba               High income          4819
# ... with 206 more rows

Now want to accomplish the same task with collapse R package.

The following code works as expected.

wlddev %>%
  fgroup_by(income) %>%
  fselect(country) %>% 
  fnobs()

               income country
1         High income    4819
2          Low income    1830
3 Lower middle income    2867
4 Upper middle income    3660

However, not able to append the column to original data.frame.

wlddev %>%
  fgroup_by(income) %>%
  fselect(country) %>% 
  fnobs() %>% 
  ftransform(.data = wlddev, Size = .)

Error in ftransform_core(.data, e) : 
  Lengths of replacements must be equal to nrow(.data) or 1, or NULL to delete columns

Any hints, please.

I guess you need a join here `wlddev %>% fgroup_by(income) %>% fselect(country) %>% fnobs() %>% rename(n = country) %>% left_join(wlddev, .)`. `add_count` creates a column whereas `fnobs` summarises, thus you can't `ftransform` when the datasets are of different size — akrun, Feb 04 '22 at 20:34

score 3 · Answer 1 · answered Feb 05 '22 at 01:58

3

Found a very simple solution:

wlddev %>% 
  fmutate(Size = fnobs(income, income, TRA = "replace_fill"))  %>% 
  fselect(country, income, Size) %>% 
  funique()

answered Feb 05 '22 at 01:58

MYaseen208

22,666
37
165
309

score 2 · Accepted Answer · answered Feb 04 '22 at 20:40

2

Unlike add_count which creates a column in the original data, the fnobs is a summarised data, which we can join

library(collapse)
 wlddev %>% 
    fgroup_by(income) %>%
    fselect(country) %>%   
    fnobs() %>% 
    rename(size = country) %>% 
   left_join(wlddev %>% 
      slt(country, income), .) %>% 
   distinct

answered Feb 04 '22 at 20:40

akrun

874,273
37
540
662

Thanks @akrun for very useful answer. Wondering if there are functions in `collapse` `R` package for joining `data.frame`s to improve speed. – MYaseen208 Feb 04 '22 at 20:46
1

@MYaseen208 join functions are not there. You may use `data.table` join if you want to do this efficiently – akrun Feb 04 '22 at 23:45
Here is a very solution: `wlddev %>% fmutate(Size = fnobs(income, income, TRA = "replace_fill")) %>% fselect(country, income, Size) %>% funique()` – MYaseen208 Feb 05 '22 at 02:32
1

@MYaseen208 yes, that is a good one. I was about to ask about the `TRA` or `fmutate` – akrun Feb 05 '22 at 15:44

Sebastian · Answer 3 · 2022-02-08T01:16:56.347

So in principle fnobs counts the number of non-missing values, an option to add the group count is not really afforded (I also wonder why that would be necessary, I have never required it). Nevertheless, the count is there in the grouping object which can be retrieved using GRP(.). So you could create a function:

gcount <- function(x) {
   # Just turning some unnecessary things off in case we pass a plain vector
   g <- GRP(x, sort = FALSE, return.groups = FALSE, call = FALSE) 
   g$group.sizes[g$group.id]
}

Then we can do

wlddev %>% 
  ftransform(Size = gcount(income)) %>%
  fselect(country, income, Size) %>% 
  funique(cols = 1) # Observations are uniquely identified by country

# or 

wlddev %>% 
  fgroup_by(income) %>%
  ftransform(Size = gcount(.)) %>%
  fselect(country, income, Size) %>% 
  fungroup() %>%
  funique(cols = 1)

Of course we can also use fnobs:

wlddev %>% 
  fgroup_by(income) %>%
  fmutate(Size = fnobs(income)) %>%
  fselect(country, income, Size) %>% 
  fungroup() %>%
  funique(cols = 1)

but that could be misleading if incomecontained missing values. Note (as stated in the documentation) that ftransform is a faster version of base::transform that ignores groupings and fmutate is a faster dplyr::mutate which respects groupings.

If you tell me why the group count would be required as a variable in a data frame, I can think of adding gcount to the next collapse release.

Thanks @Sebastian for very useful answer and especially for developing `collapse`. `collapse` is simple awesome when speed matters. It would be very nice if you consider to include `gcount` function to be include in next release of `collapse`. This function would be very handy for analysis of surveys data. — MYaseen208, Feb 08 '22 at 14:47
Thanks @MYaseen208, I have nothing against adding wit, will have to wait for a month or so though before another version can be submitted to CRAN. And it will likely be called `GRPcount` because there is already `GRPnames`. By the way, regarding `NA's`, the safest option is probably `fmutate(Size = length(income))`, which is not vectorized but still decently fast. — Sebastian, Feb 08 '22 at 21:39

Counting observations by group using collapse R package

3 Answers3