Questions tagged [multidplyr]

multidplyr is an R package by Hadley Wickham that enables parallel processing on partitioned data.frames. This tag should not be used for dplyr-only questions.

multidplyr is an R package by Hadley Wickham that enables parallel processing on partitioned data.frames. It is a complement to his popular dplyr package and part of the extended tidyverse ecosystem of packages.

51 questions
2
votes
0 answers

Can't convert an environment to function error when using multidplyr

This is an example of usage of a multidplyr call in my code, that I run on my institute's cluster: #create data set.seed(1) library(dplyr) df <- do.call(rbind,lapply(1:100,function(i){ id.df <-…
dan
  • 6,048
  • 10
  • 57
  • 125
2
votes
1 answer

Run breakpoint (lm) detection in parallel in R

I am doing about 80000 time series breakpoint detection calculations in R. I have all these extremely different time series where I cannot apply ARIMA models so I am calculating a linear model per time series, then extract the breakpoints and use…
Jonathan
  • 148
  • 1
  • 10
2
votes
1 answer

multidplyr error with pmap_dfr: Error: Element 5 is not a vector (environment)

[ This is also reported on the multidplyr github page ] I'm trying to use multidplyr_0.0.0.9000 with dplyr_0.7.4.9000 and pmap_dfr from purrr_0.2.4.9000. The following code (without using multidplyr) works fine: grid1 = as_tibble(expand.grid(m1 =…
kartik_subbarao
  • 228
  • 3
  • 15
1
vote
1 answer

Error with rep using multidplyr: cannot find function "n"

I'm trying to expand a dataframe based on the value of a column, using parallel cores with multidplyr (using dplyr). Since the command uncount() does not work with multidplyr, I am using default rep function. I get an error. Below a MWE, where I…
luchonacho
  • 6,759
  • 4
  • 35
  • 52
1
vote
1 answer

how to merge two data frame by rows of x and y but columns should be side (df1$x) by side (df2$y)?

I have two dataframes with same name of columns and rows. I would like to merge them by rows but columns need to be side by side as of df$x and df$y. I tried so far but not getting output as required. merge(df.test1, df.test2, by.x = "V1", by.y =…
RKK
  • 31
  • 11
1
vote
1 answer

merge multiple table with different length and form a single table in R

i am using plumber api for an api. i have multiple sub-tables in which all table are connected with there primary keys (study_id) and i wanted to merge all table with single primary keys to form a single table. Some tables have different length. for…
1
vote
1 answer

R multidplyr for summarise_at work around?

I want to use multidplyr, and it has yet to have anything for summarise_at. i have hundreds if not thousands, so the summarise_at is necessary, but unfortunately, not available in multidplyr. looking for an alternative to work around…
Choc_waffles
  • 518
  • 1
  • 4
  • 15
1
vote
0 answers

How do you deal with errors in parition?

I am attempting to partition my data-set such that all members of a group are sent to the same core, I am following online tutorials verbatim but there seems to be an issue. The Error is : Error in partition(., group, cluster = clust) : unused…
Dominic Naimool
  • 313
  • 2
  • 11
1
vote
1 answer

Multiply columns in different dataframes

I am writing a code for analysis a set of dplyr data. here is how my table_1 looks: 1 A B C 2 5 2 3 3 9 4 1 4 6 3 8 5 3 7 3 And my table_2 looks like this: 1 D E F 2 2 9 3 I would love to based on table 1 column"A", if A>6, then create a…
Bomber Gay
  • 39
  • 4
1
vote
0 answers

checkpoint can not find multidplyr in R-markdown

I'm trying to create an R-markdown document in which I will be running multidplyr. In order to ensure reproducability I decided to use the checkpoint library. MWE: --- title: "A great title" author: "A great author" date: "February 19, 2019" output:…
Baraliuh
  • 593
  • 3
  • 12
1
vote
2 answers

Vectorizing with multidplyr does not render the correct output

I tried to parallelize ape::dist_topo(), a function to compute distances between unrooted trees. Normally the function works like this (reprex: 4 random trees with 5 leaves each): library(tidyverse) #…
abichat
  • 2,317
  • 2
  • 21
  • 39
1
vote
2 answers

Grouping dataframe in 12 groups with same column values

I have a large dataset with about 15 columns and more than 3 million rows. Because the dataset is so big, I would like to use multidplyron it . Because of the data, it would be impossible to just split my data frame to 12 parts. Lets say that there…
Ravonrip
  • 584
  • 1
  • 7
  • 17
1
vote
0 answers

Groupwise Identification of peaks using findpeak function from Pracma Package for Moving Average Getting error MISSING VALUE WHERE TRUE/FALSE

Reproducible Data As shown Below: library(pracma);library(zoo) library(dplyr);library(tidyverse) Tag<- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5,…
Harvey
  • 245
  • 2
  • 9
1
vote
1 answer

Restructuing and formatting data frame columns

dfin <- ID SEQ GRP C1 C2 C3 T1 T2 T3 1 1 1 0 5 8 0 1 2 1 2 1 5 10 15 5 6 7 2 1 2 20 25 30 0 1 2 C1 is the concentration (CONC) at T1 (TIME) and so on. This…
daragh
  • 173
  • 1
  • 11
1
vote
1 answer

multidplyr: trial custom function

I'm trying to learn to run a custom function through multidplyr::do() on a cluster. Consider this simple self contained example. For example's sake, I'm trying to apply my custom function myWxTest to each common_dest (destinations with more than 50…
user189035
  • 5,589
  • 13
  • 52
  • 112