1

I'm beginner programmer looking for help with Simple Moving Average SMA. I'm working with column files, where first one is related to the time and second is value. The time intervals are random and also the value. Usually the files are not big, but the process is collecting data for long time. At the end files look similar to this:

+-----------+-------+
|   Time    | Value |
+-----------+-------+
| 10        |     3 |
| 1345      |    50 |
| 1390      |     4 |
| 2902      |    10 |
| 34057     |    13 |
| (...)     |       |
| 898975456 |    10 |
+-----------+-------+

After whole process number of rows is around 60k-100k.

Then i'm trying to "smooth" data with some time window. For this purpose I'm using SMA. [AWK_method]

awk 'BEGIN{size=$timewindow} {mod=NR%size; if(NR<=size){count++}else{sum-=array[mod]};sum+=$1;array[mod]=$1;print sum/count}' file.dat

To achive proper working of SMA with predefined $timewindow i create linear increment filled with zeros. Next, I run a script using diffrent $timewindow and I observe the results.

+-----------+-------+
|   Time    | Value |
+-----------+-------+
| 1         |     0 |
| 2         |     0 |
| 3         |     0 |
| (...)     |       |
| 10        |     3 |
| 11        |     0 |
| 12        |     0 |
| (...)     |       |
| 1343      |     0 |
| (...)     |       |
| 898975456 |    10 |
+-----------+-------+

For small data it was relatively comfortable, but now it is quite time-devouring, and created files starting to be too big. I'm also familiar with Gnuplot but SMA there is hell...

So here are my questions:

  • Is it possible to change the awk solution to bypass filling data with zeros?
  • Do you recomend any other solution using bash?
  • I also have considered to learn python because after 6 months of learning bash, I have got to know its limitation. Will I able to solve this in python without creating big data?

I'll be glad with any form of help or advices.

Best regards!

[AWK_method] http://www.commandlinefu.com/commands/view/2319/awk-perform-a-rolling-average-on-a-column-of-data

B.Krz
  • 11
  • 2

2 Answers2

0

You included a python tag, check out traces:

http://traces.readthedocs.io/en/latest/

Here are some other insights:

Moving average for time series with not-equal intervls

http://www.eckner.com/research.html

https://stats.stackexchange.com/questions/28528/moving-average-of-irregular-time-series-data-using-r

https://en.wikipedia.org/wiki/Unevenly_spaced_time_series

key phrase in bold for more research:

In statistics, signal processing, and econometrics, an unevenly (or unequally or irregularly) spaced time series is a sequence of observation time and value pairs (tn, Xn) with strictly increasing observation times. As opposed to equally spaced time series, the spacing of observation times is not constant.

Community
  • 1
  • 1
litepresence
  • 3,109
  • 1
  • 27
  • 35
0
awk '{Q=$2-last;if(Q>0){while(Q>1){print "| "++i"        |     0 |";Q--};print;last=$2;next};last=$2;print}'   Input_file
Ajean
  • 5,528
  • 14
  • 46
  • 69
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93