2

I am using map function from purrr library to apply segmented function (from segmented library) as follows:

library(purrr)
library(dplyr)
library(segmented)

# Data frame is nested to create list column
by_veh28_101 <- df101 %>% 
  filter(LCType=="CFonly", Lane %in% c(1,2,3)) %>% 
  group_by(Vehicle.ID2) %>% 
  nest() %>% 
  ungroup()

# Functions:
segf2 <- function(df){
  try(segmented(lm(svel ~ Time, data=df), seg.Z = ~Time,
                psi = list(Time = df$Time[which(df$dssvel != 0)]),
                control = seg.control(seed=2)),
      silent=TRUE)
}


segf2p <- function(df){
  try(segmented(lm(PrecVehVel ~ Time, data=df), seg.Z = ~Time,
                psi = list(Time = df$Time[which(df$dspsvel != 0)]),
                control = seg.control(seed=2)),
      silent=TRUE)
}  

# map function:
models8_101 <- by_veh28_101 %>% 
  mutate(segs = map(data, segf2),
         segsp = map(data, segf2p))  

The object by_veh28_101 contains 2457 tibbles. And the last step, where map function is used, takes 16 minutes to complete. Is there any way to make this faster?

umair durrani
  • 5,597
  • 8
  • 45
  • 85
  • 2
    It's not really `purrr` that's slow here, it's `segmented`. You're running thousands of models, which takes a while. Profile your code to see exactly what the bottlenecks are. – alistaire Dec 06 '16 at 22:27

2 Answers2

5

You may use the function future_map instead of map.

This function comes from the package furrr and is a parallel option for the map family. Here is the link for the README of the package.

Because your code question it is not reproducible, I cant prepare a benchmark between the map and future_map functions.

Your code with the future_map function is the following:

library(tidyverse)
library(segmented)
library(furrr)


# Data frame stuff....

# Your functions....

# future_map function

# this distribute over the different cores of your computer
# You set a "plan" for how the code should run. The easiest is `multiprocess`
# On Mac this picks plan(multicore) and on Windows this picks plan(multisession)

plan(strategy = multiprocess)

models8_101 <- by_veh28_101 %>% 
  mutate(segs = future_map(data, segf2),
         segsp = future_map(data, segf2p)) 
  • 2
    Just to note that 'multicore' does not work from within RStudio any more. Forked processing is considered unstable when running R from certain environments, such as the RStudio environment. Because of this, 'multicore' futures have been disabled in those cases since future 1.13.0. [https://cran.case.edu/web/packages/future/NEWS](https://cran.case.edu/web/packages/future/NEWS) – cengel Nov 11 '21 at 19:12
0

Well, just now I rewrote a purrr::map loop (filtering >40,000 vector elements in a list with some logical tests) with some simple Rcpp. Previously it did not finish in > 2 min; but now it finished running in seconds (about 2~3 seconds, to be precise). Just for reference.

zxr6
  • 1
  • 2
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 13 '23 at 14:52