2

Given the following pandas data frame with 60 elements.

import pandas as pd
data = [60,62.75,73.28,75.77,70.28
    ,67.85,74.58,72.91,68.33,78.59
    ,75.58,78.93,74.61,85.3,84.63
    ,84.61,87.76,95.02,98.83,92.44
    ,84.8,89.51,90.25,93.82,86.64
    ,77.84,76.06,77.75,72.13,80.2
    ,79.05,76.11,80.28,76.38,73.3
    ,72.28,77,69.28,71.31,79.25
    ,75.11,73.16,78.91,84.78,85.17
    ,91.53,94.85,87.79,97.92,92.88
    ,91.92,88.32,81.49,88.67,91.46
    ,91.71,82.17,93.05,103.98,105]

data_pd = pd.DataFrame(data, columns=["price"])

Is there a formula to rescale this in such a way so that for each window bigger than 20 elements starting from index 0 to index i+1, the data is rescaled down to 20 elements?

Here is a loop that is creating the windows with the data for rescaling, i just do not know any way of doing the rescaling itself for this problem at hand. Any suggestions on how this might be done?

miniLenght = 20
rescaledData = []
for i in range(len(data_pd)):
    if(i >= miniLenght):
        dataForScaling = data_pd[0:i]
        scaledDataToMinLenght = dataForScaling #do the scaling here so that the length of the rescaled data is always equal to miniLenght
        rescaledData.append(scaledDataToMinLenght)

Basically after the rescaling the rescaledData should have 40 arrays, each with a length of 20 prices.

RaduS
  • 2,465
  • 9
  • 44
  • 65
  • What do you do to rescale? – cs95 Jul 14 '17 at 21:09
  • Take a look at [`df.rolling`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html). May be of some use. – cs95 Jul 14 '17 at 21:14
  • that is for window rolling. I already created windows with data in the loop. The question is not really about how to roll a window. It is on how to rescale the data from longer lengths to smaller lengths – RaduS Jul 14 '17 at 21:16

1 Answers1

3

From reading the paper, it looks like you are resizing the list back to 20 indices, then interpolating the data at your 20 indices.

We'll make the indices like they do (range(0, len(large), step = len(large)/miniLenght)), then use numpys interp - there are a million ways of interpolating data. np.interp uses a linear interpolation, so if you asked for eg index 1.5, you get the mean of points 1 and 2, and so on.

So, here's a quick modification of your code to do it (nb, we could probably fully vectorize this using 'rolling'):

import numpy as np
miniLenght = 20
rescaledData = []

for i in range(len(data_pd)):
    if(i >= miniLenght):
        dataForScaling = data_pd['price'][0:i]
        #figure out how many 'steps' we have
        steps = len(dataForScaling)
        #make indices where the data needs to be sliced to get 20 points
        indices = np.arange(0,steps, step = steps/miniLenght)
        #use np.interp at those points, with the original values as given
        rescaledData.append(np.interp(indices, np.arange(steps), dataForScaling))

And the output is as expected:

[array([ 60.  ,  62.75,  73.28,  75.77,  70.28,  67.85,  74.58,  72.91,
         68.33,  78.59,  75.58,  78.93,  74.61,  85.3 ,  84.63,  84.61,
         87.76,  95.02,  98.83,  92.44]),
 array([ 60.    ,  63.2765,  73.529 ,  74.9465,  69.794 ,  69.5325,
         74.079 ,  71.307 ,  72.434 ,  77.2355,  77.255 ,  76.554 ,
         81.024 ,  84.8645,  84.616 ,  86.9725,  93.568 ,  98.2585,
         93.079 ,  85.182 ]),.....
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • Thanks you @jeremycg for the answer. That was it :) I will award the bounty to this answer in 15h when it allows me ;) – RaduS Jul 17 '17 at 05:47