Hello I have an huge list of values, I want to to find all n values pattern like list[0:30], list[1:31]. And to each value compare percentage to the first, like percentage_change(array[0],array[1]), percentage_change(array[0],array[2]), all the way till the end of pattern. After this, I want to store all the 30 values patterns in an array of patterns to compare to other values in the future.
To do so I have to build a function: To this function, 30 values can be changed to any of my choices by change variable numberOfEntries For each pattern, I do the mean of the 10 next outcomes and store it in an array of outcomes with the same index
#end point is the end of array
#inputs (array, numberOfEntries)
#outPut(list of Patterns, list of outcomes)
y=0
condition= numberOfEntries+1
#each pattern list
pattern=[]
#list of patterns
Patterns=[]
#outcomes array
outcomes=[]
while (y<len(array)):
i=1
while(i<condition):
#this is percentage change function, I have built it inside to gain speed. Try is used because possibility of 0 division
try:
x = ((float(array[y-(numberOfEntries-i)])-array[y-numberOfEntries])/abs(array[y-numberOfEntries]))*100.00
if x == 0.0:
x=0.000000001
except:
x= 0.00000001
i+=1
pattern.append(x)
#here is the outcomes
outcomeRange = array[y+5:y+15]
outcome.append(outcomeRange)
Patterns.append(pattern)
#clean pattern array
pattern=[]
y+=1
Doing this to an 8559 values array, which is small for the quantity of data I have took me 229.6792.
There is a way of adapt this to multithreading or an way of improve this speed?
EDIT:
To explain better, I have this ohlc data:
open high low close volume
TimeStamp
2016-08-20 15:50:00 0.003008 0.003008 0.002995 0.003000 6.351215
2016-08-20 15:55:00 0.003000 0.003008 0.003000 0.003008 6.692174
2016-08-20 16:00:00 0.003008 0.003009 0.002996 0.003001 10.813029
2016-08-20 16:05:00 0.003001 0.003000 0.002991 0.002991 4.368509
2016-08-20 16:10:00 0.002991 0.002993 0.002989 0.002990 6.662944
2016-08-20 16:15:00 0.002990 0.003015 0.002989 0.003015 8.495640
I extract this as
array=df['close'].values
Then I apply this array to the function and it will return a list full of lists like this for this particular set of values,
[0.26, 0.03, -0.03, -0.04, ,0.005]
This are percent changes from each row to the begin of the sample, and this is what I call a pattern. I can choose how much entries can have a pattern.
Hope I'm more clear now...