2

Having a list of dates and events which is not necessarily sorted by date e.g. like

# Date     Event
04.12.2018 -4
23.06.2018 5
04.10.2018 3
11.11.2018 -9
08.03.2018 -4
08.03.2018 2
11.11.2018 -3

I would like to sum up the events and do a (e.g. linear) extrapolation, e.g. when the data will hit a certain threshold (e.g. zero).

It looks like smooth frequency and smooth cumulative seemed to be made for this. But I am struggeling with the following:

a) how can I add a start value (offset), e.g. StartValue = 500

plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):($2+StartValue) smooth cumulative w l t "Cumulated Events"

doesn't do it.

b) how can I get the cumulative data? Especially if the data is not sorted by date?

set table "DataCumulative.dat"
    plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative with table
unset table

This look similar to this question (GNUPLOT: saving data from smooth cumulative) but I don't get the expected numbers. In my example below in the file "DataCumulative.dat", I expected unique dates and basically the data from the lower plot. How to get this?

The code:

### start code
reset session
set colorsequence classic

# function for creating a random date between two dates
t(date_str) = strptime("%d.%m.%Y", date_str)
date_random(d0,d1) = strftime("%d.%m.%Y",rand(0)*(t(d1)-t(d0)) + t(d0))

# create some random date data
date_start = "01.01.2018"
date_end = "30.06.2018"
set print $Data
do for [i=1:1000] {
    print sprintf("%s\t%g", date_random(date_start,date_end), floor(rand(0)*10-6))
}
set print

set xdata time
set timefmt "%d.%m.%Y"
set xtics format "%b"
set xrange[date_start:"31.12.2018"]

set multiplot layout 2,1
    plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth frequency with impulses t "Events"
    plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative w l t "Cumulated Events"
unset multiplot

# attempt to get cumulative data into datablock
set table "DataCumulative.dat"
    plot $Data u (strftime("%d.%m.%Y",timecolumn(1,"%d.%m.%Y"))):2 smooth cumulative with table
unset table
### end of code

The plots: enter image description here

theozh
  • 22,244
  • 5
  • 28
  • 72

1 Answers1

2

I guess, I finally got it now. However, there are a few learnings which I still don't understand completely.

1. In order to get the cumulative data you should not set

set table $DataCumulative
    plot $Data u (stringcolumn(1)):2 smooth cumulative with table
unset table

but instead:

set table $DataCumulative
    plot $Data u (stringcolumn(1)):2 smooth cumulative 
unset table

note the missing "with table" in the plot command. The first version gives you the original data, the second one the desired cumulative data. But I don't yet understand why.

2. the default datafile separator setting which is

set datafile separator whitespace

it doesn't seem not to work. It will give an error message like line xxx: No data to fit

instead, you have to set

set datafile separator " \t"  # space and TAB

But I don't understand why.

3. fitting time date

f_lin(x) = m*x + c

won't give a good fit at all. Apparently, you have to subtract the start date and do the fitting.

f_lin(x) = m*(x-strptime("%d.%m.%Y", Date_Start)) + c

I remember reading this long time ago in the gnuplot documention but I can't find it anymore.

For the time being, I am happy now with the following.

The modified code:

### generate random date between two dates
reset session

# function for creating a random date between two dates
t(date_str) = strptime("%d.%m.%Y", date_str)
date_random(d0,d1) = strftime("%d.%m.%Y",rand(0)*(t(d1)-t(d0)) + t(d0))

# create some random date data
Date_Start = "01.01.2018"
Date_End = "30.06.2018"
set print $Data
do for [i=1:100] {
    print sprintf("%s\t%g", date_random(Date_Start,Date_End), floor(rand(0)*10-6))
}
set print

set xdata time
set timefmt "%d.%m.%Y"

# get cumulative data into datablock
set xtics format "%d.%m.%Y"
set table $DataCumulative
    plot $Data u (stringcolumn(1)):2 smooth cumulative
unset table
set xtics format "%b"

set datafile separator " \t"  # space and TAB

# linear function and fitting
f_lin(x) = m*(x-strptime("%d.%m.%Y", Date_Start)) + c
set fit nolog quiet
fit f_lin(x) $DataCumulative u 1:2 via m,c

Level_Start = 500
Level_End = 0
x0 = (Level_End - Level_Start - c)/m  + strptime("%d.%m.%Y", Date_Start)

set multiplot layout 3,1
    # event plot & cumulative plot
    set xrange[Date_Start:"31.12.2018"]
    set xtics format ""
    set lmargin 7
    set bmargin 0
    plot $Data u (timecolumn(1,"%d.%m.%Y")):2 smooth frequency with impulses lc rgb "red" t "Events 2018"
    set xtics format "%b"
    set bmargin
    plot $Data u (timecolumn(1,"%d.%m.%Y")):2 smooth cumulative w l lc rgb "web-green" t "Cumulated Events 2018"

    # fit & extrapolation plot
    set label 1 at x0, graph 0.8 strftime("%d.%m.%Y",x0) center
    set arrow 1 from x0, graph 0.7 to x0, Level_End 
    set key at graph 0.30, graph 0.55
    set xrange[Date_Start:x0+3600*24*50] # end range = extrapolated date + 50 days
    set xtics format "%m.%y"
    set yrange [-90:] 
    plot $DataCumulative u (timecolumn(1,"%d.%m.%Y")):($2+Level_Start) w l lc rgb "blue" t "Cumulated Events",\
    Level_End w l lc rgb "red" not,\
    f_lin(x)+Level_Start w l ls 0 t "Fitting \\& Extrapolation"

unset multiplot
### end of code

will result in: enter image description here

theozh
  • 22,244
  • 5
  • 28
  • 72
  • To 1) this is exactly, what `with table` is meant for, to avoid style-dependent processing like smoothing (see docs). 2) Don't know. 3) I guess this is, because linear fits of time data involves a huge offset which leads to numerical issues if you don't subtract a reasonable start date – Christoph Dec 02 '18 at 13:31
  • 1) right, you can find it with `help with table` 2) maybe a hint with `help time/date` "If time/date information is to be plotted from a file, the using option _must_ be used on the plot or splot command. These commands simply use white space to separate columns, but white space may be embedded within the time/date string. If you use tabs as a separator, some trial-and-error may be necessary to discover how your system treats them." 3) yes, I also guess so. Still can't find this remark in the doc. – theozh Dec 03 '18 at 07:40