3

This is a bit hard to explain. I have a list of integers. So, for example, [1, 2, 4, 5, 8, 7, 6, 4, 1] - which, when plotted against element number, would resemble a convex graph. How do I somehow extract this 'shape' characteristic from the list? It doesn't have to particularly accurate - just the general shape, convex w/ one hump, concave w/ two, straight line, etc - would be fine.

I could use conditionals for every possible shape: for example, if the slope is positive upto a certain index, and negative after, it's a slope, with the skewness depending on index/list_size.

Is there some cleverer, generalised way? I suppose this could be a classification problem - but is it possible without ML?

Cheers.

smci
  • 32,567
  • 20
  • 113
  • 146
vinit_ivar
  • 610
  • 6
  • 16
  • This seems more like a math question than a programming question... but I think what you want to do is fit them to an nth degree polynomial and use the derivates to determine shape (second derivative gives concavity, number of critical points gives "humps", etc.) – en_Knight Sep 06 '14 at 19:14
  • Perhaps you can iterate through the array and validate (exactly or not) some Math principles. E.g.: convex => `v[i+1] + v[i-1] >= v[i]` for every `i`. – ROMANIA_engineer Sep 06 '14 at 19:14
  • 2
    You can find the rough shape by doing the differences between each consecutive term, and then the differences between them - that will approximate (and the key here is approximate to the 2nd derivative. Even the first set of differences you can see changes in sign which will show you peaks/troughs. – Tony Suffolk 66 Sep 06 '14 at 19:53

2 Answers2

9
numpy.diff 

The first order difference is given by out[n] = a[n+1] - a[n]

https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.diff.html

import numpy as np

data = [1, 2, 4, 5, 8, 7, 6, 4, 1]
data = np.array(data, dtype=float)
velocity = np.diff(data)
acceleration = np.diff(velocity)
jerk = np.diff(acceleration)
jounce = np.diff(jerk)

print data
print velocity
print acceleration
print jerk
print jounce

>>>
[ 1.  2.  4.  5.  8.  7.  6.  4.  1.]

# positive numbers = rising
[ 1.  2.  1.  3. -1. -1. -2. -3.]

# positive numbers = concave up
[ 1. -1.  2. -4.  0. -1. -1.]

# positive numbers = curling up
[-2.  3. -6.  4. -1.  0.]

# positive numbers = snapping up
[  5.  -9.  10.  -5.   1.]

https://en.wikipedia.org/wiki/Velocity

https://en.wikipedia.org/wiki/Acceleration

https://en.wikipedia.org/wiki/Jerk_(physics)

https://en.wikipedia.org/wiki/Jounce

my tendency is to then divide 1st derivative; velocity by a moving average and multiply by 100 to convert to %ROC; sometimes acceleration is also important; the concavity... the further you get jerk/jounce the more stochastic/noisy the data becomes

you can also calculate the mean of each:

print np.mean(data)
print np.mean(velocity)
print np.mean(acceleration)

to make generalizations about the shape, for this sample set:

>>>
4.22222222222     # average value
0.0               # generally sideways; no trend
-0.571428571429   # concave mostly down

and then the mean relative standard deviation

import numpy as np
data = [1, 2, 4, 5, 8, 7, 6, 4, 1]
coef_variance = np.std(data) / np.mean(data)
print coef_variance

>>>0.566859453383

which I'd call "fairly volatile"; but not extreme by orders of magnitude; typically >1 is considered "highly variant"

https://en.wikipedia.org/wiki/Coefficient_of_variation

and if we plot:

import matplotlib.pyplot as plt
import numpy as np

data = [1, 2, 4, 5, 8, 7, 6, 4, 1]
x = range(9)

plt.plot(x,data,c='red',ms=2)

plt.show()

we can see that is a generally good description of what we find:

enter image description here

no overall up/down trend, fairly volatile, concave down; mean just over 4

you can also polyfit:

import matplotlib.pyplot as plt
import numpy as np

data = [1, 2, 4, 5, 8, 7, 6, 4, 1]
x = range(9)
plt.plot(x,data,c='red',ms=2)
poly = np.polyfit(x,data,2)
z = []
for x in range(9):
    z.append(poly[0]*x*x + poly[1]*x + poly[2])
x = range(9)
plt.plot(x,z,c='blue',ms=2)
print poly
plt.show()

which returns:

[-0.37445887  3.195671   -0.07272727]

in other words:

-0.374x^2 +  3.195x - 0.072

which plots:

enter image description here

from there you can calculate sum of squares to see how accurate your model is

Sum of Square Differences (SSD) in numpy/scipy

and you could iterate the polyfit process increasing the degree each time

np.polyfit(x,data,degree)

until you attain an adequately low SSD for your needs; which would tell you if your data is more x^2ish, x^3ish, x^4ish, etc.

while ssd > your_desire:               
   poly_array = polyfit()
   ssd = sum_squares(poly_array, data)
   degree +=1
Community
  • 1
  • 1
litepresence
  • 3,109
  • 1
  • 27
  • 35
0

How about if you difference the data (I.e., x[i+1] - x[i]) repeatedly until all the results are the same sign? For example, if you difference it twice and all the results are nonnegative, you know it's convex. Otherwise difference again and check the signs. You could set a limit, say 10 or so, beyond which you figure the sequence is too complex to characterize. Otherwise, your shape is characterized by the number of times you difference, and the ultimate sign.

Russ Lenth
  • 5,922
  • 2
  • 13
  • 21