2

I'm preparing a big multivariate time series data set for a supervised learning task and I would like to create time shifted versions of my input features so my model also infers from past values. In pandas there's the shift(n) command that lets you shift a column by n rows. Is there something similar in vaex?

I could not find anything comparable in the vaex documentation.

sobek
  • 1,386
  • 10
  • 28

2 Answers2

5

No, we do not support that yet (https://github.com/vaexio/vaex/issues/660). Because vaex is extensible (see http://docs.vaex.io/en/latest/tutorial.html#Adding-DataFrame-accessors) I thought I would give you the solution in the form of that:

import vaex
import numpy as np

@vaex.register_dataframe_accessor('mytool', override=True)
class mytool:
    def __init__(self, df):
        self.df = df

    def shift(self, column, n, inplace=False):
        # make a copy without column
        df = self.df.copy().drop(column)
        # make a copy with just the colum
        df_column = self.df[[column]]
        # slice off the head and tail
        df_head = df_column[-n:]
        df_tail = df_column[:-n]
        # stitch them together
        df_shifted = df_head.concat(df_tail)
        # and join (based on row number)
        return df.join(df_shifted, inplace=inplace)

x = np.arange(10)
y = x**2
df = vaex.from_arrays(x=x, y=y)
df['shifted_y'] = df.y
df2 = df.mytool.shift('shifted_y', 2)
df2

It generates a single column datagram, slices that up, concatenates and joins it back. All without a single memory copy.

I am assuming here a cyclic shift/rotate.

Maarten Breddels
  • 1,344
  • 10
  • 12
  • Very cool! I should get more into the guts of vaex. :-> I need a non-cyclic shift but that's trivial to achieve with your example. Thanks! – sobek Apr 07 '20 at 12:45
  • @sobek would you mind sharing your alteration to make the shift non-cyclic? Thanks! – Joe Jun 10 '20 at 00:12
  • @Joe Have a look at this issue on github https://github.com/vaexio/vaex/issues/661. Maarten wrote some code that can do cyclic and non-cyclic. – sobek Jun 12 '20 at 14:04
2

The function needs to be modified slightly in order to work in the latest release (vaex 4.0.0ax), see this thread.

Code by Maarten should be updated as follows:

import vaex
import numpy as np

@vaex.register_dataframe_accessor('mytool', override=True)
class mytool:
    def __init__(self, df):
        self.df = df

    # mytool.shift is the analog of pandas.shift() but add the shifted column with specified name to the end of initial df

    def shift(self, column, new_column, n, cyclic=True):
        df = self.df.copy().drop(column)
        df_column = self.df[[column]]
        if cyclic:
            df_head = df_column[-n:]
        else:
            df_head = vaex.from_dict({column: np.ma.filled(np.ma.masked_all(n, dtype=float), 0)})
        df_tail = df_column[:-n]

        df_shifted = df_head.concat(df_tail)
        df_shifted.rename(column, new_column)

        return df_shifted

x = np.arange(10)
y = x**2
df = vaex.from_arrays(x=x, y=y)
df2 = df.join(df.mytool.shift('y', 'shifted_y', 2))
df2
  • it is worth mentioning that in order to obtain completely correct analog of `.shift()` in pandas there are two points to be changed: 1) control dtype when use masked array, 2) control filling (in mentioned code nans are filled by zero) – Artem Alexandrov Dec 26 '20 at 20:21