13

Quite often I have to work with a bunch of noisy, somewhat correlated time series. Sometimes I need some mock data to test my code, or to provide some sample data for a question on Stack Overflow. I usually end up either loading some similar dataset from a different project, or just adding a few sine functions and noise and spending some time to tweak it.

What's your approach? How do you generate noisy signals with certain specs? Have I just overlooked some blatantly obvious standard package that does exactly this?

The features I would generally like to get in my mock data:

  • Varying noise levels over time
  • Some history in the signal (like a random walk?)
  • Periodicity in the signal
  • Being able to produce another time series with similar (but not exactly the same) features
  • Maybe a bunch of weird dips/peaks/plateaus
  • Being able to reproduce it (some seed and a few parameters?)

I would like to get a time series similar to the two below [A]:

Real time series 1 Real time series 2

I usually end up creating a time series with a bit of code like this:

import numpy as np

n = 1000
limit_low = 0
limit_high = 0.48
my_data = np.random.normal(0, 0.5, n) \
          + np.abs(np.random.normal(0, 2, n) \
                   * np.sin(np.linspace(0, 3*np.pi, n)) ) \
          + np.sin(np.linspace(0, 5*np.pi, n))**2 \
          + np.sin(np.linspace(1, 6*np.pi, n))**2

scaling = (limit_high - limit_low) / (max(my_data) - min(my_data))
my_data = my_data * scaling
my_data = my_data + (limit_low - min(my_data))

Which results in a time series like this:

Mock time series

Which is something I can work with, but still not quite what I want. The problem here is mainly that:

  1. it doesn't have the history/random walk aspect
  2. it's quite a bit of code and tweaking (this is especially a problem if i want to share a sample time series)
  3. I need to retweak the values (freq. of sines etc.) to produce another similar but not exactly the same time series.

[A]: For those wondering, the time series depicted in the first two images is the traffic intensity at two points along one road over three days (midnight to 6 am is clipped) in cars per second (moving hanning window average over 2 min). Resampled to 1000 points.

Swier
  • 4,047
  • 3
  • 28
  • 52
  • Have you considered taking an ideal data set and just adding some white noise to it? – Mad Physicist Mar 29 '16 at 14:03
  • A bit yeah, but then I'm still stuck with the problem that all the actual features (weird dips/peaks, periodicity etc.) are still _exactly_ the same – Swier Mar 29 '16 at 14:08
  • To change periodicity I guess it would be feasible to resample various parts to slightly more or fewer points. – Swier Mar 29 '16 at 14:24
  • Have you ever thought about using biological data? Check this out, You cold download a large chromosome (eg chr1) or the smallest (chr21) then use a moving average where you calculate the %GC content. Nothing like biological data for random walks with plateaus, local dip and peaks... – mccurcio May 09 '17 at 06:00
  • Do you find a good time series generator? I am also looking for such a library in java or python ... ? – user3352632 Feb 28 '18 at 14:43
  • @user3352632 https://docs.scipy.org/doc/scipy/reference/signal.html#waveforms allows you to make some time series, though I'm sure there's better generators out there. – Swier Mar 02 '18 at 14:03

1 Answers1

3

Have you looked into TSimulus? By using Generators, you should be able generate data with specific patterns, periodicity, and cycles.

The TSimulus project provides tools for specifying the shape of a time series (general patterns, cycles, importance of the added noise, etc.) and for converting this specification into time series values.


Otherwise, you can try "drawing" the data yourself and exporting those data points using Time Series Maker.

PeterWhy
  • 78
  • 1
  • 2
  • 9