5

I am trying to apply the Hurst Exponent on SPY closing prices on a rolling window. The below code (which I got from here: https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing) works well if I apply it on the closing prices column. However this gives me a static value. I would like to apply the Hurst Exponent on a rolling window considering the last 200 closing prices. My objective is to get a column in which the Hurst Exponent is updated in each row considering the last 200 closing prices.

from numpy import cumsum, log, polyfit, sqrt, std, subtract
from numpy.random import randn
import pandas_datareader as dr
from datetime import date

df = dr.data.get_data_yahoo('SPY',start='23-01-1991',end=date.today())

def hurst(ts):
    """Returns the Hurst Exponent of the time series vector ts"""
    # Create the range of lag values
    lags = range(2, 100)

    # Calculate the array of the variances of the lagged differences
    tau = [sqrt(std(subtract(ts[lag:], ts[:-lag]))) for lag in lags]

    # Use a linear fit to estimate the Hurst Exponent
    poly = polyfit(log(lags), log(tau), 1)

    # Return the Hurst exponent from the polyfit output
    return poly[0]*2.0

print ("Hurst(SPY): %s" % hurst(df['Close']))

## I've tried the next lines of code but unfortunately they are not working:
df['Hurst_Column']= [0]
for aRowINDEX in range( 1, 200 ):
    df['Hurst_Column'][-aRowINDEX] = hurst (df[u'Close'][:-aRowINDEX]) 

I am very new at Python, and I have tried different things with no luck. Anyone could help me with it, please? Any help would be more than welcome. Thank you!

Martingale
  • 511
  • 1
  • 6
  • 15

3 Answers3

6

Let me offer you a two step way forwards:

Step 1: a bit more robust Hurst Exponent implementation with test data

Step 2: a simple way to produce a "sliding-window"-alike calculation

Step 3: a bit more complicated way - if a ROLLING WINDOW is a MUST ...

Bonus: What should I write under the code of my question to have it done?


Step 1: a bit more robust Hurst Exponent implementation with test data :

Here, I will post a function implementation, taken from QuantFX module, as-is ( Py2.7 will not make troubles in most places, yet any xrange() ought be replaced with range() in Py3.x ).

This code contains a few improvements and some sort of self-healing, if tests show, that there are problems with data-segment ( QuantFX uses a convention of a natural flow of the time, where data[0] is the "oldest" time-series cell and data[-1] being the "most recent" one ).

Calling the HurstEXP() without any parameter will yield a demo-run, showing some tests and explanations of the subject matter.

Also the print( HurstEXP.__doc__ ) is self-explanatory:

def HurstEXP( ts = [ None, ] ):                                         # TESTED: HurstEXP()                Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
            """                                                         __doc__
            USAGE:
                        HurstEXP( ts = [ None, ] )

                        Returns the Hurst Exponent of the time series vector ts[]

            PARAMETERS:
                        ts[,]   a time-series, with 100+ elements
                                ( or [ None, ] that produces a demo run )

            RETURNS:
                        float - a Hurst Exponent approximation,
                                as a real value
                                or
                                an explanatory string on an empty call
            THROWS:
                        n/a
            EXAMPLE:
                        >>> HurstEXP()                                        # actual numbers will vary, as per np.random.randn() generator used
                        HurstEXP( Geometric Browian Motion ):    0.49447454
                        HurstEXP(    Mean-Reverting Series ):   -0.00016013
                        HurstEXP(          Trending Series ):    0.95748937
                        'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'

                        >>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
            REF.s:
                        >>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
            """
            #---------------------------------------------------------------------------------------------------------------------------<self-reflective>
            if ( ts[0] == None ):                                       # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:

                 gbm = np.log( 1000 + np.cumsum(     np.random.randn( 100000 ) ) )  # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log(  1000 + rand + ... )]
                 mr  = np.log( 1000 +                np.random.randn( 100000 )   )  # a Mean-Reverting Series    [log(1000 + rand), log(1000 + rand        ), log(1000 + rand               ),... log(  1000 + rand       )]
                 tr  = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) )  # a Trending Series          [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]

                                                                        # Output the Hurst Exponent for each of the above SYNTH series
                 print ( "HurstEXP( Geometric Browian Motion ):   {0: > 12.8f}".format( HurstEXP( gbm ) ) )
                 print ( "HurstEXP(    Mean-Reverting Series ):   {0: > 12.8f}".format( HurstEXP( mr  ) ) )
                 print ( "HurstEXP(          Trending Series ):   {0: > 12.8f}".format( HurstEXP( tr  ) ) )

                 return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
            """                                                         # FIX:
            ===================================================================================================================
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
            0.47537688039105963
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
            -0.31081076640420308
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
            nan
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
            warnings.warn(msg, RankWarning)
            0.026867491053098096
            """
            pass;     too_short_list = 101 - len( ts )                  # MUST HAVE 101+ ELEMENTS
            if ( 0 <  too_short_list ):                                 # IF NOT:
                 ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
            #---------------------------------------------------------------------------------------------------------------------------
            lags = range( 2, 100 )                                                              # Create the range of lag values
            tau  = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ]  # Calculate the array of the variances of the lagged differences
            #oly = np.polyfit( np.log( lags ), np.log( tau ), 1 )                               # Use a linear fit to estimate the Hurst Exponent
            #eturn ( 2.0 * poly[0] )                                                            # Return the Hurst exponent from the polyfit output
            """ ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
              warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
              arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
              ret, rcount, out=ret, casting='unsafe', subok=False)
            QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
              return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
              warnings.warn(msg, RankWarning)
            0.028471879418359915
            |
            |
            |# DATA:
            |
            |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
            memmap([ 1763.31005859,  1765.01000977,  1765.44995117,  1764.80004883,
                     1765.83996582,  1768.91003418,  1771.04003906,  1769.43994141,
                     1771.4699707 ,  1771.61999512,  1774.76000977,  1769.55004883,
                     1773.4699707 ,  1773.32995605,  1770.08996582,  1770.20996094,
                     1768.34997559,  1768.02001953,  1767.59997559,  1767.23999023,
                     1768.41003418,  1769.06994629,  1769.56994629,  1770.7800293 ,
                     1770.56994629,  1769.7800293 ,  1769.90002441,  1770.44995117,
                     1770.9699707 ,  1771.04003906,  1771.16003418,  1769.81005859,
                     1768.76000977,  1769.39001465,  1773.23999023,  1771.91003418,
                     1766.92004395,  1765.56994629,  1762.65002441,  1760.18005371,
                     1755.        ,  1756.67004395,  1753.48999023,  1753.7199707 ,
                     1751.92004395,  1745.44995117,  1745.44995117,  1744.54003906,
                     1744.54003906,  1744.84997559,  1744.84997559,  1744.34997559,
                     1744.34997559,  1743.75      ,  1743.75      ,  1745.23999023,
                     1745.23999023,  1745.15002441,  1745.31005859,  1745.47998047,
                     1745.47998047,  1749.06994629,  1749.06994629,  1748.29003906,
                     1748.29003906,  1747.42004395,  1747.42004395,  1746.98999023,
                     1747.61999512,  1748.79003906,  1748.79003906,  1748.38000488,
                     1748.38000488,  1744.81005859,  1744.81005859,  1736.80004883,
                     1736.80004883,  1735.43005371,  1735.43005371,  1737.9699707
                     ], dtype=float32
                    )
            |
            |
            | # CONVERTED .tolist() to avoid .memmap-type artifacts:
            |
            |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
            [1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
            859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
            C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
              ret = ret.dtype.type(ret / rcount)

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
              warnings.warn(msg, RankWarning)
            0.028471876494884543
            ===================================================================================================================
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
            0.47537688039105963
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
            -0.31081076640420308
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
            nan
            |
            |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

            Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
            C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
            warnings.warn(msg, RankWarning)
            0.026867491053098096
            """
            return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

Step 2: a simple way to produce a "sliding-window" calculation :

 [ ( -i, HurstEXP( ts = df['Close'][:-i] ) ) for i in range( 1, 200 ) ] # should call the HurstEXP for the last 200 days

TEST-ME:

>>> df[u'Close']
Date
1993-01-29     43.937500
1993-02-01     44.250000
...
2019-07-17    297.739990
2019-07-18    297.429993
Name: Close, Length: 6665, dtype: float64
>>> 

>>> [ (                          -i,
         HurstEXP( df[u'Close'][:-i] )
         )                   for  i in range( 1, 10 )
         ]
[ ( -1, 0.4489364467179827  ),
  ( -2, 0.4489306967683502  ),
  ( -3, 0.44892205577752986 ),
  ( -4, 0.448931424819551   ),
  ( -5, 0.44895272101162326 ),
  ( -6, 0.44896713741862954 ),
  ( -7, 0.44898211557287204 ),
  ( -8, 0.4489941656580211  ),
  ( -9, 0.4490116318052649  )
  ]

Step 3: a bit more complicated way - if a ROLLING WINDOW is a MUST ... :

While not much memory / processing efficient, the "rolling window" trick may get injected into the game, whereas there is no memory, the less a processing efficiency benefit from doing so ( you spend a lot on syntactically plausible code, yet the processing efficiency does not get here any plus from doing it right this way, as convolved nature of HurstEXP() cannot help, without an attempt to re-vectorise also its internal code (why and what for ever?) any better from this ... just if professor or boss still wants you to do so ... ):

def rolling_window( aMatrix, aRollingWindowLENGTH ):                    #
            """                                                                 __doc__
            USAGE:   rolling_window( aMatrix, aRollingWindowLENGTH )

            PARAMS:  aMatrix                a numpy array
                     aRollingWindowLENGTH   a LENGTH of a rolling window

            RETURNS: a stride_trick'ed numpy array with rolling windows

            THROWS:  n/a

            EXAMPLE: >>> x = np.arange( 10 ).reshape( ( 2, 5 ) )

                     >>> rolling_window( x, 3 )
                     array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]],
                            [[5, 6, 7], [6, 7, 8], [7, 8, 9]]])

                     >>> np.mean( rolling_window( x, 3 ), -1 )
                     array([[ 1.,  2.,  3.],
                            [ 6.,  7.,  8.]])
            """
            new_shape   = aMatrix.shape[:-1] + ( aMatrix.shape[-1] - aRollingWindowLENGTH + 1, aRollingWindowLENGTH )
            new_strides = aMatrix.strides    + ( aMatrix.strides[-1], )
            return np.lib.stride_tricks.as_strided( aMatrix,
                                                    shape   = new_shape,
                                                    strides = new_strides
                                                    )

>>> rolling_window( df[u'Close'], 100 ).shape
(6566, 100)

>>> rolling_window( df[u'Close'], 100 ).flags
    C_CONTIGUOUS    : False
    F_CONTIGUOUS    : False
    OWNDATA         : False <---------------- a VIEW, not a replica
    WRITEABLE       : True
    ALIGNED         : True
    WRITEBACKIFCOPY : False
    UPDATEIFCOPY    : False

You get an array of 6566 vectors with "rolling_window"-ed 100-day blocks of of SPY[Close]-s

>>> rolling_window( df[u'Close'], 100 )
array([[ 43.9375    ,  44.25      ,  44.34375   , ...,  44.5       ,  44.59375   ,  44.625     ],
       [ 44.25      ,  44.34375   ,  44.8125    , ...,  44.59375   ,  44.625     ,  44.21875   ],
       [ 44.34375   ,  44.8125    ,  45.        , ...,  44.625     ,  44.21875   ,  44.8125    ],
       ...,
       [279.14001465, 279.51998901, 279.32000732, ..., 300.6499939 , 300.75      , 299.77999878],
       [279.51998901, 279.32000732, 279.20001221, ..., 300.75      , 299.77999878, 297.73999023],
       [279.32000732, 279.20001221, 278.67999268, ..., 299.77999878, 297.73999023, 297.42999268]])

Q: What should I write under the code of my question to have it done?

for                         aRowINDEX in range( 1, 200 ):
    df[u'HurstEXP_COLUMN'][-aRowINDEX] = HurstEXP( df[u'Close'][:-aRowINDEX] )
    print( "[{0:>4d}]: DIFF( hurst() - HurstEXP() ) == {1:}".format( aRowINDEX,
                           ( hurst(    df[u'Close'][:-aRowINDEX] )
                           - HurstEXP( df[u'Close'][:-aRowINDEX] )
                             )
            )
user3666197
  • 1
  • 6
  • 50
  • 92
  • Thank you very much for such complete answer. About the second option (which looks easier for me) I added the next line of code: df['H'] = [(-i, hurst( ts = df['Close'][:-i] ) ) for i in range( 1, 200 ) ] in order to create that new column. But unfortunately I receive this message: ValueError: Length of values does not match length of index. Am I doing it correctly? – Martingale Jul 18 '19 at 16:32
  • You try to store one single poor scalar ( a float value ) into a whole columnar vector ... try this to see the root-cause `df[u'H'].shape` ~ **`Name: H, Length: 6665, dtype: float64`** which obviously must crash. Rather match also the value-sizes by **`df[ u'H' ][-i] = HurstEXP( ts = df[ 'Close' ][:-i] )`** in an attempt to assign one float value into a one, float-sized cell ( inside the `DataFrame` of interest ) – user3666197 Jul 18 '19 at 16:48
  • Thank you again! I appreciate your help here. I've been playing around with the code but unfortunately I couldn't solve it yet. As I am very new at Python, and I have different options, I don't know very well how to solve it. For sure, this is happening to me because I am very new at this. I tried to put the next code under the code of the question: df[u'H'][-i] = [ ( -i, hurst( ts = df['Close'][:-i] ) ) for i in range( 1, 200 ) ]. But I couldn't make it work either. What should I write under the code of my question to have it done? Thank you so much for your patience! – Martingale Jul 18 '19 at 18:14
  • Thanks, I included my code also above your lines of code. Do you mean that way? Notice that I have changed the function "HurstEXP" to "hurst" because of my code. The problem is that code also produces an error: LinAlgError: SVD did not converge in Linear Least Squares. Am I wrong again? :( – Martingale Jul 18 '19 at 18:53
  • You might have noticed already in the examples above, that the range()-iterator has to be setup so as it starts from a value of **1**, not from a default value of 0. – user3666197 Jul 18 '19 at 19:24
  • Thanks, I tried again with your last code but it is not working. I changed the code of question adding your lines of code (so that you can see the entire code that I am using). The error message says just:KeyError: 'Hurst_Column'. Can you make it work using the entire code of my question? – Martingale Jul 18 '19 at 20:38
  • If your DataFrame does not contain a column, named "Hurst_Column", the reported exception is obvious. Check your own definitions. The code above works as a charm and storing values ---- be it into a column named **df[ 'H' ]** as you have used already above, in the commented remarks, or any other ( but present ) name ---- is a step not related to either the computation or the iteration logic, that was demonstrated to be correct already. – user3666197 Jul 18 '19 at 20:46
  • Thanks, I’ve been trying to figure out in the last days how to finish the code and put your code and my code together. I have even done a new course in Python in order to help me with some gaps that I have in Python but unfortunately I couldn’t fix this issue yet. I tried to make the code work using the above code (I modified the question) with no luck. Please, could you tell me how should be the entire code (adding your code + my code) to make it work? I would be eternally grateful if you could post the entire code adding yours and mine. Thanks!! – Martingale Jul 23 '19 at 08:40
  • A big compliment for the detailed answer! One thing is not clear to me: Why always the range (2, 100) used as lags? H is therefore not independent of the length of the ts. But ts can has any length, would not it be better to use eg a quarter of the ts length (or any other ratio) as the upper limit instead of an fixed value? BTW: Then H could be calculated for ts shorter than 100 too. – SkatEddy Oct 06 '19 at 17:37
2

A bit late but there doesn't seem to be a clean solution online. This seems to work for me - you need to install the hurst package first though.

from hurst import compute_Hc

H = lambda x: compute_Hc(x)[0]

window = 200
df['Hurst_Columns'] = df['Close'].rolling(window).apply(H)

Neb
  • 21
  • 1
-1

A bit later...

    start = 200
    # Get df length
    samples = len(df)
    
    for i in range(start,samples):
        end = i + start
        if end <= samples:
            ts = df.iloc[i:end]
            hurst = HurstEXP(ts['Close'].values)
            print(ts['Date'].loc[end-1], ",", ts['Time'].loc[end-1], ",", hurst, file=h_file)