Questions tagged [multidplyr]

multidplyr is an R package by Hadley Wickham that enables parallel processing on partitioned data.frames. This tag should not be used for dplyr-only questions.

multidplyr is an R package by Hadley Wickham that enables parallel processing on partitioned data.frames. It is a complement to his popular dplyr package and part of the extended tidyverse ecosystem of packages.

51 questions
0
votes
0 answers

How to predict values using random forest model and multidplyr packages in parallel processing mode?

multidplyr package not parallelizing prediction process for random forest model It runs when not processed parallely. here [["finalModel"]] is output from caret package using randomForest R package. pred_dat <- dat_met_morph %>% …
0
votes
0 answers

multidplyr within a for loop- warning: closing unused connections

I'm running a loop within loop that subsets a large, sparse dataset and calculates mean and median for each lab variable within a 2,3 and 5 year search window. I'm getting a message "warning: closing unused connections"- is it because multidplyr is…
Wojty
  • 59
  • 5
0
votes
0 answers

How do I amend missing values to 0 when using full_join function in R?

I have an issue with the below code and the error corresponding to the code. Can someone help? Trying to run the below matrices that are defined- ESL <- full_join(ES_Liquidity, ES_Liquidity_prev, by = "Institution"); ESL_SMBPN <-…
aaliad
  • 1
  • 1
0
votes
1 answer

How can I use apply/lapply on each row of a data frame where calculation requires fetching data from another data frame

I am trying to calculate jaccard coefficient for a two mode network data. My data looks like this: df <- data.frame(patent = c("A", "B", "B", "C", "C", "C"), class = c("X", "Y", "Z", "X", "Y", "Z")) node_list <- df %>% …
0
votes
1 answer

How can I add additional years to my dataset using multidplyr or parallel processing?

I have a dataset (MN_Census) that has information for all census tracts for the following years: 1990, 2000, 2010, and 2020. The variable ID that identifies the census tract is "GISJOIN". My dataset looks like…
0
votes
0 answers

Error: function 'Rcpp_precious_remove' not provided by package 'Rcpp'

I am trying to implement the example given here: https://cran.r-project.org/web/packages/multidplyr/vignettes/multidplyr.html I however get the following error when I get to the point where I need to partition the data using ether method 1 or 2. I…
Kishron
  • 1
  • 1
0
votes
0 answers

Combine dtplyr and multidplyr to deal with large mutate operation

I am combining dtplyr and multidplyr libraries to handle some basic mutate/summarise operations carried out on a very large db. final_db_partition, after merging is sometimes 30m lines long. I cannot figure out if I am doing something wrong but the…
MCS
  • 1,071
  • 9
  • 23
0
votes
0 answers

Why is left_join creating NAs when values seem to match in the by="x" argument?

I am trying to carry out a left_join between two dataframes, called multi_scenario and production_targets. I am trying to carry out the join based on the following code, using a left_join based on the matched column "mean_needed"…
0
votes
1 answer

How to set time out in multidplyr

I inconsistently get following error when using multidplyr (ie, for the same data sometimes I get the error, sometimes not): Error in rs_init(self, private, super, options, wait, wait_timeout) : Could not start R session, timed out My setting…
ava
  • 840
  • 5
  • 19
0
votes
0 answers

Parallel processing with a function that uses parallel processing?

I am using multidplyr package (my dataset, map, and MyFnc) for parallel processing in a dplyr syntax. However, MyFnc also uses parallel processing via parallel and doSnow libraries. In this case, can I use parallel processing efficiently?…
Enes
  • 59
  • 5
0
votes
1 answer

Error in is.data.frame(.l) : object 'group' not found

Not sure if you all will be able to help me without reproducible example data, but I have a problem with running the code below. I am attempting to use the multidplyr package, but it doesn't seem to find my columns. I am running the code below: cl…
ChessGuy
  • 149
  • 2
  • 10
0
votes
1 answer

Send different dplyr::mutate cols to different cores with multdplyr?

I have a function that I'm applying to different sets of coordinates to create four new columns in my tibble. This function has a pretty long start-up time (loads the genome into RAM, converts tibble to GRanges, and retrieves sequences) but is…
GenesRus
  • 1,057
  • 6
  • 16
0
votes
1 answer

R spread dataframe

IN R language how to convert data1 into data2 data1 = fread(" id year cost pf loss A 2019-02 155 10 41 B 2019-03 165 14 22 B 2019-01 185 34 56 C 2019-02 350 50 0 A 2019-01 310 40 99") data2 = fread(" id item 2019-01 2019-02 2019-03 A cost…
0
votes
3 answers

create a new variable in the data frame based on multiple criteria in r

I have a data set which has COl1 COl2 Col3 1 0 0 0 1 0 0 0 1 1 0 0 Based on these three column i need to add new variable in the same table Expected Output COl1 COl2 Col3 New_variable 1 0 0 c1 0…
0
votes
0 answers

Rstudio fatal error with partion in multidplyr

I am having issues with Rstudio keeps crashing when I am trying to partition my data. I managed to reproduce the same problem using the nycflights13 data is being used in the vignette of multidplyr. This works library(multidplyr) library(dplyr,…
user572549
  • 71
  • 4