2

I'm attempting to use Dask, specifically dask delayed to generate time series forecast in parallel using rpy2 and the forecast package in R. My process works when only using 1 core but I get a

NotImplementedError: Conversion 'py2ri' not defined for objects of type '<class 'pandas.core.series.Series'>'

when using dask delayed with more than 1 core. The code used to reproduce this issue is shown below:

from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
import rpy2.robjects as robjects
#get ts object as python object
ts=robjects.r('ts')
pandas2ri.activate()

import pandas as pd
import numpy as np
from dask.distributed import Client, LocalCluster
import dask

#start cluster:

cluster = LocalCluster()
client = Client(cluster)

#define R function to generate time series in R from python series
def r_vecs(time_series):

    rdata=ts(time_series,frequency=12)

    return rdata

#Generate DataFrame of time series
rows = 24
ncolumns = 5
column_names = ['ts1','ts2','ts3','ts4','ts5']
df = pd.DataFrame(np.random.randint(0,10000,size=(rows, ncolumns)), columns=column_names)
df_date_index = pd.date_range(end='2018-04-01', periods=rows, freq='MS')
df.index = df_date_index 

Use dask delayed to loop through each time series in the dataframe and turn into a time series

Works:

output_fc_R = []
for i in df:
    forecasted_series = r_vecs(df[i])
    output_fc_R.append(forecasted_series)

output_fc_R

Doesn't work:

#Try to forecast in parallel with Dask
output_fc_R = []
for i in df:
    forecasted_series = dask.delayed(r_vecs)(df[i])
    output_fc_R.append(forecasted_series)

total = dask.delayed(output_fc_R).compute()
Davis
  • 163
  • 2
  • 10
  • Both of these examples work for me in a new virtualenv with python 3.6.2 and the latest pypi versions of dask, rpy2, pandas, etc. Can you give more specifics on which version of python (which version of jupyter?) etc you're using and how you're running things? – Noah Apr 30 '18 at 17:02
  • @Noah I'm not going into the full explanation here because I simplified the example for SO, I was originally using R forecast package and a custom function in the original post, and I ended up finding a workaround anyways but the error referenced here was a separate issue but I think it'd be too much to post here. you can find the thread [here](http://github.com/dask/distributed/issues/1939) I'll also be posting a full walkthrough on my [site](http://davistownsend.github.io) soon – Davis Apr 30 '18 at 19:58

1 Answers1

2

I'm still not sure what exactly causes the issue, but when I first explicitly convert the time series to an R Intvector object, things seem to work correctly.

def r_vecs(time_series):

    time_series = robjects.IntVector(time_series)
    rdata=ts(time_series,frequency=12)

    return rdata

In my original post, there was also different issues related to fitting an R model in the forecast package by evaluating a python string. If you want to follow the full thread see: https://github.com/dask/distributed/issues/1939

Davis
  • 163
  • 2
  • 10