26

tl;dr

Of the same numpy array, calculating np.cos takes 3.2 seconds, wheras np.sin runs 548 seconds (nine minutes) on Linux Mint.

See this repo for full code.


I've got a pulse signal (see image below) which I need to modulate onto a HF-carrier, simulating a Laser Doppler Vibrometer. Therefore signal and its time basis need to be resampled to match the carrier's higher sampling rate.

pulse signal to be modulated onto HF-carrier

In the following demodulation process both the in-phase carrier cos(omega * t) and the phase-shifted carrier sin(omega * t) are needed. Oddly, the time to evaluate these functions depends highly on the way the time vector has been calculated.

The time vector t1 is being calculated using np.linspace directly, t2 uses the method implemented in scipy.signal.resample.

pulse = np.load('data/pulse.npy')  # 768 samples

pulse_samples = len(pulse)
pulse_samplerate = 960  # 960 Hz
pulse_duration = pulse_samples / pulse_samplerate  # here: 0.8 s
pulse_time = np.linspace(0, pulse_duration, pulse_samples,
                         endpoint=False)

carrier_freq = 40e6  # 40 MHz
carrier_samplerate = 100e6  # 100 MHz
carrier_samples = pulse_duration * carrier_samplerate  # 80 million

t1 = np.linspace(0, pulse_duration, carrier_samples)

# method used in scipy.signal.resample
# https://github.com/scipy/scipy/blob/v0.17.0/scipy/signal/signaltools.py#L1754
t2 = np.arange(0, carrier_samples) * (pulse_time[1] - pulse_time[0]) \
        * pulse_samples / float(carrier_samples) + pulse_time[0]

As can be seen in the picture below, the time vectors are not identical. At 80 million samples the difference t1 - t2 reaches 1e-8.

difference between time vectors <code>t1</code> and <code>t2</code>

Calculating the in-phase and shifted carrier of t1 takes 3.2 seconds each on my machine.
With t2, however, calculating the shifted carrier takes 540 seconds. Nine minutes. For nearly the same 80 million values.

omega_t1 = 2 * np.pi * carrier_frequency * t1
np.cos(omega_t1)  # 3.2 seconds
np.sin(omega_t1)  # 3.3 seconds

omega_t2 = 2 * np.pi * carrier_frequency * t2
np.cos(omega_t2)  # 3.2 seconds
np.sin(omega_t2)  # 9 minutes

I can reproduce this bug on both my 32-bit laptop and my 64-bit tower, both running Linux Mint 17. On my flat mate's MacBook, however, the "slow sine" takes as little time as the other three calculations.


I run a Linux Mint 17.03 on a 64-bit AMD processor and Linux Mint 17.2 on 32-bit Intel processor.

Finwood
  • 3,829
  • 1
  • 19
  • 36

2 Answers2

18

I don't think numpy has anything to do with this: I think you're tripping across a performance bug in the C math library on your system, one which affects sin near large multiples of pi. (I'm using "bug" in a pretty broad sense here -- for all I know, since the sine of large floats is poorly defined, the "bug" is actually the library behaving correctly to handle corner cases!)

On linux, I get:

>>> %timeit -n 10000 math.sin(6e7*math.pi)
10000 loops, best of 3: 191 µs per loop
>>> %timeit -n 10000 math.sin(6e7*math.pi+0.12)
10000 loops, best of 3: 428 ns per loop

and other Linux-using types from the Python chatroom report

10000 loops, best of 3: 49.4 µs per loop 
10000 loops, best of 3: 206 ns per loop

and

In [3]: %timeit -n 10000 math.sin(6e7*math.pi)
10000 loops, best of 3: 116 µs per loop

In [4]: %timeit -n 10000 math.sin(6e7*math.pi+0.12)
10000 loops, best of 3: 428 ns per loop

but a Mac user reported

In [3]: timeit -n 10000 math.sin(6e7*math.pi)
10000 loops, best of 3: 300 ns per loop

In [4]: %timeit -n 10000 math.sin(6e7*math.pi+0.12)
10000 loops, best of 3: 361 ns per loop

for no order-of-magnitude difference. As a workaround, you might try taking things mod 2 pi first:

>>> new = np.sin(omega_t2[-1000:] % (2*np.pi))
>>> old = np.sin(omega_t2[-1000:])
>>> abs(new - old).max()
7.83773902468434e-09

which has better performance:

>>> %timeit -n 1000 new = np.sin(omega_t2[-1000:] % (2*np.pi))
1000 loops, best of 3: 63.8 µs per loop
>>> %timeit -n 1000 old = np.sin(omega_t2[-1000:])
1000 loops, best of 3: 6.82 ms per loop

Note that as expected, a similar effect happens for cos, just shifted:

>>> %timeit -n 1000 np.cos(6e7*np.pi + np.pi/2)
1000 loops, best of 3: 37.6 µs per loop
>>> %timeit -n 1000 np.cos(6e7*np.pi + np.pi/2 + 0.12)
1000 loops, best of 3: 2.46 µs per loop
Community
  • 1
  • 1
DSM
  • 342,061
  • 65
  • 592
  • 494
  • just for completeness: I get ``%timeit -n 1000000 math.sin(6e7*math.pi+0.12)``: ``1000000 loops, best of 3: 461 ns per loop`` and ``%timeit -n 1000000 math.sin(6e7*math.pi)``: ``1000000 loops, best of 3: 425 ns per loop`` with Windows. – MSeifert Mar 05 '16 at 19:17
  • 1
    Might this have to do with [denormal numbers](https://en.wikipedia.org/wiki/Denormal_number)? I remember writing some floating point code that got extremely slow when very small, but nonzero numbers were involved. – cfh Mar 05 '16 at 23:22
  • but this "bug" comes into effect with _big_ numbers, not the close-to-zero ones... – Finwood Mar 06 '16 at 10:24
  • @Finwood Not an explanation, but if the issue is big numbers, can you just take it mod 2pi? – Paul Mar 06 '16 at 22:28
  • @Paul yes, this is what I'm doing to circumvent the issue – Finwood Mar 07 '16 at 10:12
4

One possible cause of these huge performance differences might be in how the math library creates or handles IEEE floating point underflow (or denorms), which might be produced by a difference of some of the tinier mantissa bits during transcendental function approximation. And your t1 and t2 vectors might differ by these smaller mantissa bits, as well as the algorithm used to compute the transcendental function in whatever libraries you linked, as well as the IEEE arithmetic denorms or underflow handler on each particular OS.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153