An R package to implement the data table back-end for 'dplyr'.
Questions tagged [dtplyr]
45 questions
0
votes
0 answers
dtplyr throws an 'invalid 'type' (closure) of argument' using group_by() %>% sample_n() in dplyr and [[sample()], by=id] in data.table
I've been using dtplyr to speed up an overly complex dplyr code, and so far it's been excellent, apart from one issue I can't seem to resolve.
The problem is pretty straight forward to solve in both dplyr and data.table, but I can't see a way of…

marine-ecologist
- 132
- 8
0
votes
1 answer
Assign multiple columns when using mutate in dtplyr
Is there a way of getting my data table to look like my target table when using dtplyr and mutate?`
A Dummy table
library(data.table)
library(dtplyr)
library(dplyr)
id <- rep(c("A","B"),each=3)
x1 <- rnorm(6)
x2 <- rnorm(6)
dat <-…

user7816601
- 1
- 1
0
votes
1 answer
dplyr to data.table for speed up execution time
I am currently dealing with a moderately large dataframe called d.mkt (> 2M rows and 12 columns). As dplyr is too slow when applying summarise() function combined with group_by_at, I am trying to write an equivalent statement using data.table to…

user177196
- 738
- 1
- 8
- 16
0
votes
1 answer
How to apply a custom recursive function with data.table and loop over each index group-wise?
Since I can't find an answer in below questions:
Apply a recursive function over groups and rows without explicit for loop
How do I mimic the drag functionality for new rows such as in Excel but for R?
I'll try in asking a new question related to…

geometricfreedom
- 99
- 6
0
votes
0 answers
R Combine use of new readr/vroom lazy loading + dplyr AND dtplyr/data.table?
I am loading a large dataset that I need to filter approximately 1/20th of the rows and then group_by by 5 columns and summarize 3 remaining ones.
This page https://vroom.r-lib.org/articles/benchmarks.html
says sampling, filtering, and grouped…

Arthur Yip
- 5,810
- 2
- 31
- 50
0
votes
0 answers
Left_join too big to handle
I am trying to perform some matching that rests on the implementation of a one-to-many left_join.
The issue is that -even running the whole thing using cluster computing - the basic match produce a dataset too big to handle.
I get this error:
#Error…

MCS
- 1,071
- 9
- 23
0
votes
1 answer
How to ggplot using dtplyr / data.table without converting it into dataframe or tibble?
I am trying dtplyr & data.table for first time to do some time optimization in my existing dplyr code.
Issue: if I use data.table / dtplyr data object then I am unable to plot with ggplot. And before plotting in pipe/chain commands if I just convert…

ViSa
- 1,563
- 8
- 30
0
votes
0 answers
How to use sf package with dtplyr lazy_dt() data to create geom_sf plot in r?
(Post has been reframed to get more clarity & also updated all links)
Recently got to know about dtplyr package so was trying to use them to make my existing code work faster by using lazy_dt() instead of just dplyr.
But I am getting error when I…

ViSa
- 1,563
- 8
- 30
0
votes
0 answers
Combine dtplyr and multidplyr to deal with large mutate operation
I am combining dtplyr and multidplyr libraries to handle some basic mutate/summarise operations carried out on a very large db.
final_db_partition, after merging is sometimes 30m lines long.
I cannot figure out if I am doing something wrong but the…

MCS
- 1,071
- 9
- 23
0
votes
1 answer
semi_join and anti_join functions creating dtplyr objects instead of data frames
So I'm working on a project which requires me to combine dataframes with semi_join and anti_join from dplyr. However, instead of creating a data.frame as output, I get a dtplyr_step_subset object which I am unable to use and I have no idea how it…

AntPalmer
- 51
- 6
0
votes
2 answers
Create tables by using data.table and a for loop for multiple columns
I need to speed up code using data.table. I am getting stuck on how to reference variables that are being indexed from a vector.
data:
df <- data.frame(
id=c(1,1,1,2,2,2,3,3,3),
year=as.character(c(2014, 2015, 2016, 2015, 2015, 2016, NA, NA,…

EML
- 615
- 4
- 14
0
votes
2 answers
Creating new column based on repeated consecutive row entries
Imagine a snippet of the follow data frame:
ID ActivityName Time Type Shape
1 1 Request 0.000 Type_1 767
2 1 Request 600.000 Type_1 767 …
user12928769
0
votes
1 answer
Incorrect translation of group-filter-select with dtplyr
A group-filter-select is easy to perform with dplyr. In the example below, we have some data on companies for different quarters this year. I now want to filter to the first quarter of companies which don't have data for the fourth quarter (in this…

Wasabi
- 2,879
- 3
- 26
- 48
0
votes
0 answers
dtplyr and count conflict on a data.table object
It seems that when dtplyr is installed with dplyr and data.table I get this issue
library(dplyr)
library(data.table)
library(dtplyr)
dt1 = mtcars
# works as expected
dt1 %>% count(cyl)
# # A tibble: 3 x 2
# cyl n
#
# 1 …

AntoniosK
- 15,991
- 2
- 19
- 32
-2
votes
1 answer
average price for every combination of categorical variable - R
I am working with diamonds data set.
> dput(diamonds_2[1:100,])
structure(list(carat = structure(c(4L, 2L, 4L, 10L, 12L, 5L,
5L, 7L, 3L, 4L, 11L, 4L, 3L, 12L, 1L, 13L, 11L, 11L, 11L, 11L,
11L, 4L, 4L, 12L, 12L, 4L, 5L, 11L, 4L, 4L, 4L, 4L, 4L,…

Nneka
- 1,764
- 2
- 15
- 39