Questions tagged [dtplyr]

An R package to implement the data table back-end for 'dplyr'.

45 questions
0
votes
0 answers

dtplyr throws an 'invalid 'type' (closure) of argument' using group_by() %>% sample_n() in dplyr and [[sample()], by=id] in data.table

I've been using dtplyr to speed up an overly complex dplyr code, and so far it's been excellent, apart from one issue I can't seem to resolve. The problem is pretty straight forward to solve in both dplyr and data.table, but I can't see a way of…
0
votes
1 answer

Assign multiple columns when using mutate in dtplyr

Is there a way of getting my data table to look like my target table when using dtplyr and mutate?` A Dummy table library(data.table) library(dtplyr) library(dplyr) id <- rep(c("A","B"),each=3) x1 <- rnorm(6) x2 <- rnorm(6) dat <-…
0
votes
1 answer

dplyr to data.table for speed up execution time

I am currently dealing with a moderately large dataframe called d.mkt (> 2M rows and 12 columns). As dplyr is too slow when applying summarise() function combined with group_by_at, I am trying to write an equivalent statement using data.table to…
user177196
  • 738
  • 1
  • 8
  • 16
0
votes
1 answer

How to apply a custom recursive function with data.table and loop over each index group-wise?

Since I can't find an answer in below questions: Apply a recursive function over groups and rows without explicit for loop How do I mimic the drag functionality for new rows such as in Excel but for R? I'll try in asking a new question related to…
0
votes
0 answers

R Combine use of new readr/vroom lazy loading + dplyr AND dtplyr/data.table?

I am loading a large dataset that I need to filter approximately 1/20th of the rows and then group_by by 5 columns and summarize 3 remaining ones. This page https://vroom.r-lib.org/articles/benchmarks.html says sampling, filtering, and grouped…
Arthur Yip
  • 5,810
  • 2
  • 31
  • 50
0
votes
0 answers

Left_join too big to handle

I am trying to perform some matching that rests on the implementation of a one-to-many left_join. The issue is that -even running the whole thing using cluster computing - the basic match produce a dataset too big to handle. I get this error: #Error…
MCS
  • 1,071
  • 9
  • 23
0
votes
1 answer

How to ggplot using dtplyr / data.table without converting it into dataframe or tibble?

I am trying dtplyr & data.table for first time to do some time optimization in my existing dplyr code. Issue: if I use data.table / dtplyr data object then I am unable to plot with ggplot. And before plotting in pipe/chain commands if I just convert…
ViSa
  • 1,563
  • 8
  • 30
0
votes
0 answers

How to use sf package with dtplyr lazy_dt() data to create geom_sf plot in r?

(Post has been reframed to get more clarity & also updated all links) Recently got to know about dtplyr package so was trying to use them to make my existing code work faster by using lazy_dt() instead of just dplyr. But I am getting error when I…
ViSa
  • 1,563
  • 8
  • 30
0
votes
0 answers

Combine dtplyr and multidplyr to deal with large mutate operation

I am combining dtplyr and multidplyr libraries to handle some basic mutate/summarise operations carried out on a very large db. final_db_partition, after merging is sometimes 30m lines long. I cannot figure out if I am doing something wrong but the…
MCS
  • 1,071
  • 9
  • 23
0
votes
1 answer

semi_join and anti_join functions creating dtplyr objects instead of data frames

So I'm working on a project which requires me to combine dataframes with semi_join and anti_join from dplyr. However, instead of creating a data.frame as output, I get a dtplyr_step_subset object which I am unable to use and I have no idea how it…
AntPalmer
  • 51
  • 6
0
votes
2 answers

Create tables by using data.table and a for loop for multiple columns

I need to speed up code using data.table. I am getting stuck on how to reference variables that are being indexed from a vector. data: df <- data.frame( id=c(1,1,1,2,2,2,3,3,3), year=as.character(c(2014, 2015, 2016, 2015, 2015, 2016, NA, NA,…
EML
  • 615
  • 4
  • 14
0
votes
2 answers

Creating new column based on repeated consecutive row entries

Imagine a snippet of the follow data frame: ID ActivityName Time Type Shape 1 1 Request 0.000 Type_1 767 2 1 Request 600.000 Type_1 767 …
user12928769
0
votes
1 answer

Incorrect translation of group-filter-select with dtplyr

A group-filter-select is easy to perform with dplyr. In the example below, we have some data on companies for different quarters this year. I now want to filter to the first quarter of companies which don't have data for the fourth quarter (in this…
Wasabi
  • 2,879
  • 3
  • 26
  • 48
0
votes
0 answers

dtplyr and count conflict on a data.table object

It seems that when dtplyr is installed with dplyr and data.table I get this issue library(dplyr) library(data.table) library(dtplyr) dt1 = mtcars # works as expected dt1 %>% count(cyl) # # A tibble: 3 x 2 # cyl n # # 1 …
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
-2
votes
1 answer

average price for every combination of categorical variable - R

I am working with diamonds data set. > dput(diamonds_2[1:100,]) structure(list(carat = structure(c(4L, 2L, 4L, 10L, 12L, 5L, 5L, 7L, 3L, 4L, 11L, 4L, 3L, 12L, 1L, 13L, 11L, 11L, 11L, 11L, 11L, 4L, 4L, 12L, 12L, 4L, 5L, 11L, 4L, 4L, 4L, 4L, 4L,…
Nneka
  • 1,764
  • 2
  • 15
  • 39
1 2
3