0

In a nutshell: I need to calculate the Hurst Exponent (HE) across a rolling window inside a pandas dataframe and assign the values to its own column.

The HE function I use was lifted from here as it seemed more robust. For convenience it's posted below:

def HurstEXP( ts = [ None, ] ):                                         
# TESTED: HurstEXP()                Hurst exponent ( Browninan Motion & other observations measure ) 100+ BARs back(!)
        """                                                         __doc__
        USAGE:
                    HurstEXP( ts = [ None, ] )

                    Returns the Hurst Exponent of the time series vector ts[]

        PARAMETERS:
                    ts[,]   a time-series, with 100+ elements
                            ( or [ None, ] that produces a demo run )

        RETURNS:
                    float - a Hurst Exponent approximation,
                            as a real value
                            or
                            an explanatory string on an empty call
        THROWS:
                    n/a
        EXAMPLE:
                    >>> HurstEXP()                                        # actual numbers will vary, as per np.random.randn() generator used
                    HurstEXP( Geometric Browian Motion ):    0.49447454
                    HurstEXP(    Mean-Reverting Series ):   -0.00016013
                    HurstEXP(          Trending Series ):    0.95748937
                    'SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator'

                    >>> HurstEXP( rolling_window( aDSEG[:,idxC], 100 ) )
        REF.s:
                    >>> www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing
        """
        #---------------------------------------------------------------------------------------------------------------------------<self-reflective>
        if ( ts[0] == None ):                                       # DEMO: Create a SYNTH Geometric Brownian Motion, Mean-Reverting and Trending Series:

             gbm = np.log( 1000 + np.cumsum(     np.random.randn( 100000 ) ) )  # a Geometric Brownian Motion[log(1000 + rand), log(1000 + rand + rand ), log(1000 + rand + rand + rand ),... log(  1000 + rand + ... )]
             mr  = np.log( 1000 +                np.random.randn( 100000 )   )  # a Mean-Reverting Series    [log(1000 + rand), log(1000 + rand        ), log(1000 + rand               ),... log(  1000 + rand       )]
             tr  = np.log( 1000 + np.cumsum( 1 + np.random.randn( 100000 ) ) )  # a Trending Series          [log(1001 + rand), log(1002 + rand + rand ), log(1003 + rand + rand + rand ),... log(101000 + rand + ... )]

                                                                    # Output the Hurst Exponent for each of the above SYNTH series
             print ( "HurstEXP( Geometric Browian Motion ):   {0: > 12.8f}".format( HurstEXP( gbm ) ) )
             print ( "HurstEXP(    Mean-Reverting Series ):   {0: > 12.8f}".format( HurstEXP( mr  ) ) )
             print ( "HurstEXP(          Trending Series ):   {0: > 12.8f}".format( HurstEXP( tr  ) ) )

             return ( "SYNTH series demo ( on HurstEXP( ts == [ None, ] ) ) # actual numbers vary, as per np.random.randn() generator" )
        """                                                         # FIX:
        ===================================================================================================================
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
        0.47537688039105963
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
        -0.31081076640420308
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
        nan
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
        warnings.warn(msg, RankWarning)
        0.026867491053098096
        """
        pass;     too_short_list = 101 - len( ts )                  # MUST HAVE 101+ ELEMENTS
        if ( 0 <  too_short_list ):                                 # IF NOT:
             ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD
        #---------------------------------------------------------------------------------------------------------------------------
        lags = range( 2, 100 )                                                              # Create the range of lag values
        tau  = [ np.sqrt( np.std( np.subtract( ts[lag:], ts[:-lag] ) ) ) for lag in lags ]  # Calculate the array of the variances of the lagged differences
        #oly = np.polyfit( np.log( lags ), np.log( tau ), 1 )                               # Use a linear fit to estimate the Hurst Exponent
        #eturn ( 2.0 * poly[0] )                                                            # Return the Hurst exponent from the polyfit output
        """ ********************************************************************************************************************************************************************* DONE:[MS]:ISSUE / FIXED ABOVE
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH] )
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
          warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:94: RuntimeWarning: invalid value encountered in true_divide
          arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:114: RuntimeWarning: invalid value encountered in true_divide
          ret, rcount, out=ret, casting='unsafe', subok=False)
        QuantFX.py:23034: RuntimeWarning: divide by zero encountered in log
          return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )                  # Return the Hurst exponent from the polyfit output ( a linear fit to estimate the Hurst Exponent )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
          warnings.warn(msg, RankWarning)
        0.028471879418359915
        |
        |
        |# DATA:
        |
        |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH]
        memmap([ 1763.31005859,  1765.01000977,  1765.44995117,  1764.80004883,
                 1765.83996582,  1768.91003418,  1771.04003906,  1769.43994141,
                 1771.4699707 ,  1771.61999512,  1774.76000977,  1769.55004883,
                 1773.4699707 ,  1773.32995605,  1770.08996582,  1770.20996094,
                 1768.34997559,  1768.02001953,  1767.59997559,  1767.23999023,
                 1768.41003418,  1769.06994629,  1769.56994629,  1770.7800293 ,
                 1770.56994629,  1769.7800293 ,  1769.90002441,  1770.44995117,
                 1770.9699707 ,  1771.04003906,  1771.16003418,  1769.81005859,
                 1768.76000977,  1769.39001465,  1773.23999023,  1771.91003418,
                 1766.92004395,  1765.56994629,  1762.65002441,  1760.18005371,
                 1755.        ,  1756.67004395,  1753.48999023,  1753.7199707 ,
                 1751.92004395,  1745.44995117,  1745.44995117,  1744.54003906,
                 1744.54003906,  1744.84997559,  1744.84997559,  1744.34997559,
                 1744.34997559,  1743.75      ,  1743.75      ,  1745.23999023,
                 1745.23999023,  1745.15002441,  1745.31005859,  1745.47998047,
                 1745.47998047,  1749.06994629,  1749.06994629,  1748.29003906,
                 1748.29003906,  1747.42004395,  1747.42004395,  1746.98999023,
                 1747.61999512,  1748.79003906,  1748.79003906,  1748.38000488,
                 1748.38000488,  1744.81005859,  1744.81005859,  1736.80004883,
                 1736.80004883,  1735.43005371,  1735.43005371,  1737.9699707
                 ], dtype=float32
                )
        |
        |
        | # CONVERTED .tolist() to avoid .memmap-type artifacts:
        |
        |>>> QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist()
        [1763.31005859375, 1765.010009765625, 1765.449951171875, 1764.800048828125, 1765.8399658203125, 1768.9100341796875, 1771.0400390625, 1769.43994140625, 1771.469970703125, 1771.6199951171875, 1774.760
        859375, 1743.75, 1743.75, 1745.239990234375, 1745.239990234375, 1745.1500244140625, 1745.31005859375, 1745.47998046875, 1745.47998046875, 1749.0699462890625, 1749.0699462890625, 1748.2900390625, 174
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ : QuantFX.aMinPTR,QuantFX.idxH].tolist() )
        C:\Python27.anaconda\lib\site-packages\numpy\core\_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
          ret = ret.dtype.type(ret / rcount)

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
          warnings.warn(msg, RankWarning)
        0.028471876494884543
        ===================================================================================================================
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :1000,QuantFX.idxH].tolist() )
        0.47537688039105963
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :101,QuantFX.idxH].tolist() )
        -0.31081076640420308
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :100,QuantFX.idxH].tolist() )
        nan
        |
        |>>> QuantFX.HurstEXP( QuantFX.DATA[ :99,QuantFX.idxH].tolist() )

        Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
        C:\Python27.anaconda\lib\site-packages\numpy\lib\polynomial.py:594: RankWarning: Polyfit may be poorly conditioned
        warnings.warn(msg, RankWarning)
        0.026867491053098096
        """
        return ( 2.0 * np.polyfit( np.log( lags ), np.log( tau ), 1 )[0] )  

Now in order to test the function let's grab some TSLA data from Yahoo finance:

import pandas as pd
import yfinance as yf
from datetime import datetime
from dateutil.relativedelta import relativedelta

years = 5
today = datetime.today().strftime('%Y-%m-%d')
lastyeartoday = (datetime.today() - relativedelta(years=years)).strftime('%Y-%m-%d')

df = yf.download('TSLA', 
                      start=lastyeartoday, 
                      end=today, 
                      progress=False)
df = df.dropna()
df = df[[u'Close']]
df

Output:

Date        Close
2016-02-16  31.034000
2016-02-17  33.736000
2016-02-18  33.354000
2016-02-19  33.316002
2016-02-22  35.548000
... ...
2021-02-08  863.419983
2021-02-09  849.460022
2021-02-10  804.820007
2021-02-11  811.659973
2021-02-12  816.119995
1259 rows × 1 columns

So far so good. We have the function and the data to test it with. Now let's do a sanity test, i.e. run the function against a subsample of the data:

import numpy as np
window = 20

hurst = lambda x: (HurstEXP(ts = df[u'Close'][:-x].to_numpy()))
hurst(window)

Output:

0.5163981260143369

Excellent.

Now to the meaty part. Applying the lambda across a rolling window and assigning the result to its own column. I've pretty much tried every trick I was able to dig up but cannot make it work.

The vanilla approach:

df.Close.rolling(window).apply(hurst, engine='cython', raw=True)

Gives me the following error:

TypeError: Cannot convert input [[-31.0340004  -33.73600006 -33.35400009 -33.31600189 -35.54800034
 -35.44200134 -35.79999924 -37.48600006 -38.06800079 -38.38600159
 -37.27000046 -37.66799927 -39.14799881 -40.20800018 -41.05799866
 -40.52000046 -41.74399948 -41.0359993  -41.5        -43.02999878]] of type <class 'numpy.ndarray'> to Timestamp

Then I tried to get clever:

hurst = df.apply(lambda x: pd.Series(x.index).rolling(window).agg({'Hurst': lambda window: HurstEXP(x.loc[window])})).reset_index()['Hurst']
df.assign(Hurst=hurst) 

Also failed ignominiously. So at this point - half a day later - I'm pretty much stumped. Do any of you hardcore python aficionados know of a way to do this?

Thanks a lot in advance for any insights and pointers.

Molecool
  • 534
  • 1
  • 4
  • 14

1 Answers1

2

I think your problem is that your window is too short. It says in the docstring that the window length has to be 100+ elements, and the Hurst code isn't handling it properly, resulting in a failure of the SVD.

Separately, your test is actually slicing everything but the last 20 elements, so is actually a long array, which is why it didn't fail:

tmp = df[u'Close'][:-20].to_numpy()

print(tmp.shape, HurstEXP(ts = tmp))
(1239,) 0.5163981260143368

If you test a window < 100 length, it throws a LinAlg exception:

tmp = df[u'Close'][:20].to_numpy()

print(tmp.shape, HurstEXP(ts = tmp))
(fails)

It should work if increase your rolling window length or repair the code in the Hurst function to pad out the array if it's too short.

window = 500
df.Close.rolling(window).apply(lambda x: HurstEXP(ts = x), raw=True)

The code in the HurstEXP function for handling lists shorter than 100 elements won't work for values of ts that are np.ndarray objects like those being provided from the .rolling(raw=True).

You could modify the function to start with the following, and it will work for windows under 100 elements:

def HurstEXP( ts= [ None, ] ):   
        if isinstance(ts, np.ndarray):
            ts = ts.tolist()
        

...alternatively, if you're always going to have numpy arrays, you could change the line that fixes it:

     ts = too_short_list * ts[:1] + ts                      #    PRE-PEND SUFFICIENT NUMBER of [ts[0],]-as-list REPLICAS TO THE LIST-HEAD

to

     ts = np.pad(ts, pad_width=(too_short_list,0), mode='edge')
Rick M
  • 1,012
  • 1
  • 7
  • 9
  • Hi Rick - it says in the docstring that the 'series' needs to be 100 elements plus, not that the window needs to be that wide. After all I did test a sample with a window of 20 and it worked just fine (again, see in my 'sanity test' above). Anyway, just to make sure I increased the window size to 100 and it still failed when applied via a rolling window. So ergo - the Hurst function is working fine it seems, I just can't figure out how to apply it via a rolling window. – Molecool Feb 15 '21 at 19:11
  • The window length in the `rolling` determines the length of the series that gets sent to the Hurst function. Your sanity test is *not* sending a window of 20 elements, it's sending in 1239 elements. Also, your original attempt applies `hurst`, not `HurstEXP`. Did you try the exact code I put above? – Rick M Feb 15 '21 at 19:23
  • Thanks for the quick response. Turns out you were right and I just ran this: `window = 200` `df2 = df.Close.rolling(window).apply(lambda x: HurstEXP(ts = x), engine='cython', raw=True)` `df2 = df2.dropna()` `df2` Sorry, I can't figure out how to properly format my code in a response. Anyway, it's working but only if the window length is 101 or larger. Doesn't work for a window of 100. FYI - I didn't write the Hurst function and would have a hard time fixing it. – Molecool Feb 15 '21 at 20:44
  • Great. I added how you could go about fixing the Hurst function. If the answer solved your problem, please mark it as 'accepted' by clicking the green check mark. This helps keep the focus on older SO questions which still don't have answers. Thanks, and take care. – Rick M Feb 15 '21 at 21:22