Rescale price list from a longer length to a smaller length

Question

Given the following pandas data frame with 60 elements.

import pandas as pd
data = [60,62.75,73.28,75.77,70.28
    ,67.85,74.58,72.91,68.33,78.59
    ,75.58,78.93,74.61,85.3,84.63
    ,84.61,87.76,95.02,98.83,92.44
    ,84.8,89.51,90.25,93.82,86.64
    ,77.84,76.06,77.75,72.13,80.2
    ,79.05,76.11,80.28,76.38,73.3
    ,72.28,77,69.28,71.31,79.25
    ,75.11,73.16,78.91,84.78,85.17
    ,91.53,94.85,87.79,97.92,92.88
    ,91.92,88.32,81.49,88.67,91.46
    ,91.71,82.17,93.05,103.98,105]

data_pd = pd.DataFrame(data, columns=["price"])

Is there a formula to rescale this in such a way so that for each window bigger than 20 elements starting from index 0 to index i+1, the data is rescaled down to 20 elements?

Here is a loop that is creating the windows with the data for rescaling, i just do not know any way of doing the rescaling itself for this problem at hand. Any suggestions on how this might be done?

miniLenght = 20
rescaledData = []
for i in range(len(data_pd)):
    if(i >= miniLenght):
        dataForScaling = data_pd[0:i]
        scaledDataToMinLenght = dataForScaling #do the scaling here so that the length of the rescaled data is always equal to miniLenght
        rescaledData.append(scaledDataToMinLenght)

Basically after the rescaling the rescaledData should have 40 arrays, each with a length of 20 prices.

Take a look at [`df.rolling`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html). May be of some use. — cs95, Jul 14 '17 at 21:14
that is for window rolling. I already created windows with data in the loop. The question is not really about how to roll a window. It is on how to rescale the data from longer lengths to smaller lengths — RaduS, Jul 14 '17 at 21:16

score 3 · Accepted Answer · answered Jul 16 '17 at 22:22

From reading the paper, it looks like you are resizing the list back to 20 indices, then interpolating the data at your 20 indices.

We'll make the indices like they do (range(0, len(large), step = len(large)/miniLenght)), then use numpys interp - there are a million ways of interpolating data. np.interp uses a linear interpolation, so if you asked for eg index 1.5, you get the mean of points 1 and 2, and so on.

So, here's a quick modification of your code to do it (nb, we could probably fully vectorize this using 'rolling'):

import numpy as np
miniLenght = 20
rescaledData = []

for i in range(len(data_pd)):
    if(i >= miniLenght):
        dataForScaling = data_pd['price'][0:i]
        #figure out how many 'steps' we have
        steps = len(dataForScaling)
        #make indices where the data needs to be sliced to get 20 points
        indices = np.arange(0,steps, step = steps/miniLenght)
        #use np.interp at those points, with the original values as given
        rescaledData.append(np.interp(indices, np.arange(steps), dataForScaling))

And the output is as expected:

[array([ 60.  ,  62.75,  73.28,  75.77,  70.28,  67.85,  74.58,  72.91,
         68.33,  78.59,  75.58,  78.93,  74.61,  85.3 ,  84.63,  84.61,
         87.76,  95.02,  98.83,  92.44]),
 array([ 60.    ,  63.2765,  73.529 ,  74.9465,  69.794 ,  69.5325,
         74.079 ,  71.307 ,  72.434 ,  77.2355,  77.255 ,  76.554 ,
         81.024 ,  84.8645,  84.616 ,  86.9725,  93.568 ,  98.2585,
         93.079 ,  85.182 ]),.....

Thanks you @jeremycg for the answer. That was it :) I will award the bounty to this answer in 15h when it allows me ;) — RaduS, Jul 17 '17 at 05:47

Rescale price list from a longer length to a smaller length

1 Answers1