2

I want to translate the following R code from tidyverse to collapse. The following code count observations by group and append as a column to the data.frame.

library(tidyverse)
library(collapse)
head(wlddev)

wlddev %>% 
  group_by(income) %>% 
  add_count(., name = "Size") %>% 
  select(country, income, Size) %>% 
  distinct()
# A tibble: 216 x 3
# Groups:   income [4]
   country             income               Size
   <chr>               <fct>               <int>
 1 Afghanistan         Low income           1830
 2 Albania             Upper middle income  3660
 3 Algeria             Upper middle income  3660
 4 American Samoa      Upper middle income  3660
 5 Andorra             High income          4819
 6 Angola              Lower middle income  2867
 7 Antigua and Barbuda High income          4819
 8 Argentina           Upper middle income  3660
 9 Armenia             Upper middle income  3660
10 Aruba               High income          4819
# ... with 206 more rows

Now want to accomplish the same task with collapse R package.

The following code works as expected.

wlddev %>%
  fgroup_by(income) %>%
  fselect(country) %>% 
  fnobs()

               income country
1         High income    4819
2          Low income    1830
3 Lower middle income    2867
4 Upper middle income    3660

However, not able to append the column to original data.frame.

wlddev %>%
  fgroup_by(income) %>%
  fselect(country) %>% 
  fnobs() %>% 
  ftransform(.data = wlddev, Size = .)

Error in ftransform_core(.data, e) : 
  Lengths of replacements must be equal to nrow(.data) or 1, or NULL to delete columns

Any hints, please.

dan1st
  • 12,568
  • 8
  • 34
  • 67
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
  • 1
    I guess you need a join here `wlddev %>% fgroup_by(income) %>% fselect(country) %>% fnobs() %>% rename(n = country) %>% left_join(wlddev, .)`. `add_count` creates a column whereas `fnobs` summarises, thus you can't `ftransform` when the datasets are of different size – akrun Feb 04 '22 at 20:34

3 Answers3

3

Found a very simple solution:

wlddev %>% 
  fmutate(Size = fnobs(income, income, TRA = "replace_fill"))  %>% 
  fselect(country, income, Size) %>% 
  funique()
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
2

Unlike add_count which creates a column in the original data, the fnobs is a summarised data, which we can join

library(collapse)
 wlddev %>% 
    fgroup_by(income) %>%
    fselect(country) %>%   
    fnobs() %>% 
    rename(size = country) %>% 
   left_join(wlddev %>% 
      slt(country, income), .) %>% 
   distinct
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks @akrun for very useful answer. Wondering if there are functions in `collapse` `R` package for joining `data.frame`s to improve speed. – MYaseen208 Feb 04 '22 at 20:46
  • 1
    @MYaseen208 join functions are not there. You may use `data.table` join if you want to do this efficiently – akrun Feb 04 '22 at 23:45
  • Here is a very solution: `wlddev %>% fmutate(Size = fnobs(income, income, TRA = "replace_fill")) %>% fselect(country, income, Size) %>% funique()` – MYaseen208 Feb 05 '22 at 02:32
  • 1
    @MYaseen208 yes, that is a good one. I was about to ask about the `TRA` or `fmutate` – akrun Feb 05 '22 at 15:44
1

So in principle fnobs counts the number of non-missing values, an option to add the group count is not really afforded (I also wonder why that would be necessary, I have never required it). Nevertheless, the count is there in the grouping object which can be retrieved using GRP(.). So you could create a function:

gcount <- function(x) {
   # Just turning some unnecessary things off in case we pass a plain vector
   g <- GRP(x, sort = FALSE, return.groups = FALSE, call = FALSE) 
   g$group.sizes[g$group.id]
}

Then we can do

wlddev %>% 
  ftransform(Size = gcount(income)) %>%
  fselect(country, income, Size) %>% 
  funique(cols = 1) # Observations are uniquely identified by country

# or 

wlddev %>% 
  fgroup_by(income) %>%
  ftransform(Size = gcount(.)) %>%
  fselect(country, income, Size) %>% 
  fungroup() %>%
  funique(cols = 1) 

Of course we can also use fnobs:

wlddev %>% 
  fgroup_by(income) %>%
  fmutate(Size = fnobs(income)) %>%
  fselect(country, income, Size) %>% 
  fungroup() %>%
  funique(cols = 1) 

but that could be misleading if incomecontained missing values. Note (as stated in the documentation) that ftransform is a faster version of base::transform that ignores groupings and fmutate is a faster dplyr::mutate which respects groupings.

If you tell me why the group count would be required as a variable in a data frame, I can think of adding gcount to the next collapse release.

Sebastian
  • 1,067
  • 7
  • 12
  • Thanks @Sebastian for very useful answer and especially for developing `collapse`. `collapse` is simple awesome when speed matters. It would be very nice if you consider to include `gcount` function to be include in next release of `collapse`. This function would be very handy for analysis of surveys data. – MYaseen208 Feb 08 '22 at 14:47
  • 1
    Thanks @MYaseen208, I have nothing against adding wit, will have to wait for a month or so though before another version can be submitted to CRAN. And it will likely be called `GRPcount` because there is already `GRPnames`. By the way, regarding `NA's`, the safest option is probably `fmutate(Size = length(income))`, which is not vectorized but still decently fast. – Sebastian Feb 08 '22 at 21:39