1

Hello I have a pandas column with n values

2018-01-03 01:30:00  14873.46  
2018-01-03 01:45:00  14848.01  
2018-01-03 02:00:00  14888.90  
2018-01-03 02:15:00  14834.59  
2018-01-03 02:30:00  14895.08  
2018-01-03 02:45:00  15049.98  
2018-01-03 03:00:00  15041.00  
2018-01-03 03:15:00  15198.00 

I want to get arrays of the percent change related to the first value of n sets. For example, I choose a set of three,

2018-01-03 01:30:00  14873.46  
2018-01-03 01:45:00  14848.01  
2018-01-03 02:00:00  14888.90 

2018-01-03 01:45:00  14848.01  
2018-01-03 02:00:00  14888.90  
2018-01-03 02:15:00  14834.59  

For this sets I want to be returned

  #This is percentage change between all values of the set and the first value of the same set ex (14848.01 and 14873.46) and (14888.90 and 14873.46), 
     [-0.17,0.10]
     [0.27,-0.09]

How can I get it donne using pandas

hopieman
  • 399
  • 7
  • 22
  • See also [this answer](https://stackoverflow.com/a/4924433/456550) for a way to get overlapping chunks from a series, using [ndarray strides](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html) – Christian Long Jan 11 '18 at 03:15

2 Answers2

3

Depends on how you want it presented. But here is one way:

pd.concat([
    df.pct_change().shift(-1),
    df.pct_change(2).shift(-2)
], axis=1, keys=['Shift1', 'Shift2'])

                       Shift1    Shift2
                         ColA      ColA
Date                                   
2018-01-03 01:30:00 -0.001711  0.001038
2018-01-03 01:45:00  0.002754 -0.000904
2018-01-03 02:00:00 -0.003648  0.000415
2018-01-03 02:15:00  0.004078  0.014519
2018-01-03 02:30:00  0.010399  0.009797
2018-01-03 02:45:00 -0.000597  0.009835
2018-01-03 03:00:00  0.010438       NaN
2018-01-03 03:15:00       NaN       NaN
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • What if I don't want overlapping windows? – ifly6 May 09 '19 at 19:19
  • @ifly6 it depends. There are way too many ways to interpret what you need. Probably want `resample` or `groupby` in some form. You should ask a question to make clear what you're looking for. – piRSquared May 09 '19 at 19:28
0

@piRSquared's answer is the right one in this situation.

However, this question also got me thinking about how to split a series or array in to overlapping chunks.

import numpy as np
import pandas as pd

def rolling(a, window):
    shape = (a.size - window + 1, window)
    strides = (a.itemsize, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

a = np.array(["14873.46", "14848.01", "14888.90", "14834.59", "14895.08",
              "15049.98", "15041.00", "15198.00"])
df = pd.DataFrame(rolling(a, 3))
df.columns = ['first', 'second', 'third']
print(df)

      first    second     third
0  14873.46  14848.01  14888.90
1  14848.01  14888.90  14834.59
2  14888.90  14834.59  14895.08
3  14834.59  14895.08  15049.98
4  14895.08  15049.98  15041.00
5  15049.98  15041.00  15198.00

See this answer for much more detail about splitting an array in to overlapping chunks.

Pandas' pct_change() function calculates based on the preceding value, not the first value, which is another reason that the accepted answer from @piRSquared (using shift()) is the right answer.

Christian Long
  • 10,385
  • 6
  • 60
  • 58
  • It works kind the same way, but you create a column to each pattern. I'm wondering if both methods are efficient in terms of memory usage, because I have files with about 10k patterns of 30 values each.. – hopieman Jan 11 '18 at 18:32
  • numpy's `as_strided` is memory efficient, because it does not duplicate the array values. In the answer above, I convert the 2d numpy array that I get from `as_strided` in to a Pandas DataFrame. I don't know if converting it in to a DataFrame expands the memory usage. – Christian Long Jan 11 '18 at 19:07