34

I have an array of the form:

x = np.array([ 1230.,  1230.,  1227.,  1235.,  1217.,  1153.,  1170.])

and I would like to produce another array where the values are the mean of each pair of values within my original array:

xm = np.array([ 1230.,  1228.5,  1231.,  1226.,  1185.,  1161.5])

Someone knows the easiest and fast way to do it without using loops?

iury simoes-sousa
  • 1,440
  • 3
  • 20
  • 37

6 Answers6

75

Even shorter, slightly sweeter:

(x[1:] + x[:-1]) / 2

  • This is faster:

    >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "x[:-1] + numpy.diff(x)/2"
    100 loops, best of 3: 6.03 msec per loop
    
    >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "(x[1:] + x[:-1]) / 2"
    100 loops, best of 3: 4.07 msec per loop
    
  • This is perfectly accurate:

    Consider each element in x[1:] + x[:-1]. So consider x₀ and x₁, the first and second elements.

    x₀ + x₁ is calculated to perfect precision and then rounded, in accordance to IEEE. It would therefore be the correct answer if that was all that was needed.

    (x₀ + x₁) / 2 is just half of that value. This can almost always be done by reducing the exponent by one, except in two cases:

    • x₀ + x₁ overflows. This will result in an infinity (of either sign). That's not what is wanted, so the calculation will be wrong.

    • x₀ + x₁ underflows. As the size is reduced, rounding will be perfect and thus the calculation will be correct.

    In all other cases, the calculation will be correct.


    Now consider x[:-1] + numpy.diff(x) / 2. This, by inspection of the source, evaluates directly to

    x[:-1] + (x[1:] - x[:-1]) / 2
    

    and so consider again x₀ and x₁.

    x₁ - x₀ will have severe "problems" with underflow for many values. This will also lose precision with large cancellations. It's not immediately clear that this doesn't matter if the signs are the same, though, as the error effectively cancels out on addition. What does matter is that rounding occurs.

    (x₁ - x₀) / 2 will be no less rounded, but then x₀ + (x₁ - x₀) / 2 involves another rounding. This means that errors will creep in. Proof:

    import numpy
    
    wins = draws = losses = 0
    
    for _ in range(100000):
        a = numpy.random.random()
        b = numpy.random.random() / 0.146
    
        x = (a+b)/2 
        y = a + (b-a)/2
    
        error_mine   = (a-x) - (x-b)
        error_theirs = (a-y) - (y-b)
    
        if x != y:
            if abs(error_mine) < abs(error_theirs):
                wins += 1
            elif abs(error_mine) == abs(error_theirs):
                draws += 1
            else:
                losses += 1
        else:
            draws += 1
    
    wins / 1000
    #>>> 12.44
    
    draws / 1000
    #>>> 87.56
    
    losses / 1000
    #>>> 0.0
    

    This shows that for the carefully chosen constant of 1.46, a full 12-13% of answers are wrong with the diff variant! As expected, my version is always right.

    Now consider underflow. Although my variant has overflow problems, these are much less big a deal than cancellation problems. It should be obvious why the double-rounding from the above logic is very problematic. Proof:

    ...
        a = numpy.random.random()
        b = -numpy.random.random()
    ...
    
    wins / 1000
    #>>> 25.149
    
    draws / 1000
    #>>> 74.851
    
    losses / 1000
    #>>> 0.0
    

    Yeah, it gets 25% wrong!

    In fact, it doesn't take much pruning to get this up to 50%:

    ...
        a = numpy.random.random()
        b = -a + numpy.random.random()/256
    ...
    
    wins / 1000
    #>>> 49.188
    
    draws / 1000
    #>>> 50.812
    
    losses / 1000
    #>>> 0.0
    

    Well, it's not that bad. It's only ever 1 least-significant-bit off as long as the signs are the same, I think.


So there you have it. My answer is the best unless you're finding the average of two values whose sum exceeds 1.7976931348623157e+308 or is smaller than -1.7976931348623157e+308.

Veedrac
  • 58,273
  • 15
  • 112
  • 169
  • It's probably not too relevant in practical applications, but I think this is both slightly faster than @JohnZwinck's answer, and also slightly more prone to loss of precision. – Jaime May 25 '14 at 16:41
  • 1
    @Jaime You're wrong. Mine *is* faster, but it's also perfectly accurate, whereas the `diff` version is not. I'll add some evidence to the question. – Veedrac May 25 '14 at 17:26
  • 1
    @Jaime Evidence added. – Veedrac May 25 '14 at 18:05
  • 1
    If it is efficiency you are looking for, I would even suggest `.5 * (x[1:] + x[:-1])`, since the float product is faster than the float division. – iago-lito Nov 27 '15 at 14:20
  • @Iago-lito Marginal gains; this is only a 4% increase in throughput for a noticeable loss of clarity. Much less than the 50% increase in throughput and improvement of clarity my original suggestion gave. I guess if you're in a rush, though. – Veedrac Nov 27 '15 at 16:27
  • @Veedrac I guess it also depend on your machine and on the length of `x` compared to the length of the loop. I reduced the time by 30% with my tests (long `x` and short loop).. Maybe it's worth checking in any specific case whether or not readability beats efficiency as far as `.5 * x` vs `x / 2` is concerned ;) – iago-lito Nov 27 '15 at 16:30
  • Just a suggestions: Add the decimal point for the `2` in the division such that it also works for integer arrays `x`. We see that problem A LOT when tutoring python beginners... – obachtos Jan 17 '17 at 16:41
  • @obachtos A better suggestion: use a modern Python version. – Veedrac Jan 17 '17 at 19:50
  • @Veedrac Valid point. But reality is that python 2.x is still widely used. – obachtos Jan 18 '17 at 08:55
11

Short and sweet:

x[:-1] + np.diff(x)/2

That is, take each element of x except the last, and add one-half of the difference between it and the subsequent element.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • I posted 9 seconds before you did, though I must admit you gave a better explanation! –  May 25 '14 at 14:02
  • I would have posted 9 seconds before you if I had just clicked the button as soon as I had typed the code! :) It's pretty cool that we wrote the exact same solution, meanwhile a couple other people came up with pretty different ideas, and I have to say each one of them has its own special appeal. I still like ours though! – John Zwinck May 25 '14 at 14:04
7

Try this:

midpoints = x[:-1] + np.diff(x)/2

It's pretty easy and should be fast.

2

If speed matters, use multiplication instead of division, following Veedrac answer:

    0.5 * (x[:-1] + x[1:])

Profiling results:

    >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "0.5 * (x[:-1] + x[1:])"
    100 loops, best of 3: 4.20 msec per loop

    >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "(x[:-1] + x[1:]) / 2"
    100 loops, best of 3: 5.10 msec per loop
ali
  • 21
  • 3
0
>>> x = np.array([ 1230., 1230., 1227., 1235., 1217., 1153., 1170.])

>>> (x+np.concatenate((x[1:], np.array([0]))))/2
array([ 1230. ,  1228.5,  1231. ,  1226. ,  1185. ,  1161.5,   585. ])

now you can just strip the last element, if you want

Pavel
  • 7,436
  • 2
  • 29
  • 42
0

I end up using this operation a bunch on multi-dimensional arrays so I'll post my solution (inspired by the source code for np.diff())

def zcen(a, axis=0):
    a = np.asarray(a)
    nd = a.ndim
    slice1 = [slice(None)]*nd
    slice2 = [slice(None)]*nd
    slice1[axis] = slice(1, None)
    slice2[axis] = slice(None, -1)
    return (a[slice1]+a[slice2])/2

>>> a = [[1, 2, 3, 4, 5], [10, 20, 30, 40, 50]]
>>> zcen(a)
array([[  5.5,  11. ,  16.5,  22. ,  27.5]])
>>> zcen(a, axis=1)
array([[  1.5,   2.5,   3.5,   4.5],
       [ 15. ,  25. ,  35. ,  45. ]])
Ben
  • 6,986
  • 6
  • 44
  • 71