0

When I use distinct() with dtplyr, my data table is transformed into a tibble with a new column ".keep_all". I updated all the packages I'm using by running install.packages() and then update.packages(), just for good measure. I also updated RStudio and R. Package updates and searching the web for solutions haven't worked. Any help would be appreciated!

Here's an example of what I'm doing with some reproducible code:

library(tidyr)
library(plyr)
library(dplyr)
library(data.table)
library(dtplyr)

dt <- data.table(A = c("a", "a", "b", "b", "b"), 
                 B = c(1, 2, 1, 2, 2), 
                 C = rnorm(5, 0, 1))
dt %>% select(-C) %>% group_by(A,B) %>% distinct()

Source: local data table [4 x 3]
Groups: A, B

# A tibble: 4 x 3
      A     B .keep_all
  <chr> <dbl>     <lgl>
1     a     1     FALSE
2     a     2     FALSE
3     b     1     FALSE
4     b     2     FALSE

If I don't load dtplyr, the same code will return what I want:

Source: local data table [4 x 2]
Groups: A, B

      A     B
  (chr) (dbl)
1     a     1
2     a     2
3     b     1
4     b     2

Here's my sessionInfo()

R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] data.table_1.10.4 dplyr_0.5.0       plyr_1.8.3       
[4] tidyr_0.6.3       dtplyr_0.0.2      MASS_7.3-44      

loaded via a namespace (and not attached):
 [1] compiler_3.4.0  lazyeval_0.1.10 magrittr_1.5   
 [4] R6_2.1.1        assertthat_0.1  DBI_0.6-1      
 [7] tools_3.4.0     tibble_1.3.1    Rcpp_0.12.11   
[10] rlang_0.1.1 
DataSmith
  • 11
  • 3
  • This is a known issue with a fix. Looks like you should get the development version. See here: https://github.com/hadley/dtplyr/pull/31 – Gopala May 26 '17 at 00:57
  • Thank you for the suggestion @Gopala. I downloaded the development version with `devtools::install_github("hadley/dtplyr")` and I'm still having the problem. sessionInfo now shows `dtplyr_0.0.2.9000`. Did I miss something? – DataSmith May 26 '17 at 02:38
  • Not sure. Perhaps you want to open an issue on Github with your code above and showing the dev version... – Gopala May 26 '17 at 02:47
  • Also, a temporary workaround is to unselect the column and convert back to DT. Understand it is not 'efficient / ideal.' – Gopala May 26 '17 at 14:16
  • Thanks for your help @Gopala. I did the temporary workaround for now so I can move forward with my work. I'll see about opening an issue on Github. – DataSmith May 26 '17 at 21:58

0 Answers0