-3

The following file has multiple columns with date, time and incomplete data set as shown using a simple file

# Matrix.txt
13.09.2016:23:44:10;;4.0
13.09.2016:23:44:20;10.0;
13.09.2016:23:44:30;;
13.09.2016:23:44:40;30.0;7.0

How can I do an linear interpolation on each column using awk to get the missing data:

# Output.txt
13.09.2016:23:44:10;0.0;4.0
13.09.2016:23:44:20;10.0;5.0
13.09.2016:23:44:30;20.0;6.0
13.09.2016:23:44:40;30.0;7.0
Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
  • 7
    I guess you didn't try anything this time either? – James Brown Sep 30 '16 at 13:11
  • You have written four very similar questions, none show what you have tried and what parts you are having problems with. Please read read the [help] pages about what sort of questions are allowed here on Stack Overflow also on how to ask a good question. – AdrianHHH Jan 15 '17 at 20:56

1 Answers1

0

Here is one solution in Gnu awk. It runs twice for the first given data file, remembers first and last data points (y1, y2) and their timestamps (x2, x2), computes slopes of the points (k=(y2-y1)/(x2-x1)) and inter- and extrapolates values for empty elements ((y=(x1-x)+y1).

It's not fool proof, it doesn't check for division by zeroes or if there are two points for the slopes or any other checks whatsoever.

$ cat inexpolator.awk
BEGIN {
    FS=OFS=";"
    ARGC=3; ARGV[2]=ARGV[1]        # run it twice for first file
}
BEGINFILE {                        # on the second round
        for(i in p)                # compute the slopes
            k[i]=(y2[i]-y1[i])/(x2[i]-x1[i])
}
{
    split($1,a,"[:.]")             # reformat the timestamp
    ts=mktime(a[3] " " a[2] " " a[1] " " a[4] " " a[5] " " a[6])
}
NR==FNR {                          # remember first and last points for slopes
    for(i=2;i<=NF;i++) {
        p[i]
        if(y1[i]=="") { y1[i]=$i; x1[i]=ts }
        if($i!="") { y2[i]=$i; x2[i]=ts }
    }
    next                           # only on the first round
}
{                                  # reformat ts again for output
    printf "%s", strftime("%d.%m.%Y:%H:%M:%S",ts) OFS  # print ts
    for(i=2;i<=NF;i++) {
        if($i=="") $i=k[i]*(ts-x1[i])+y1[i]            # compute missing points
        printf "%.1f%s", $i, (i<NF?OFS:ORS)            # print points
    }
}

Run it:

$ awk -f inexpolator.awk Matrix.txt
13.09.2016:23:44:10;0.0;4.0
13.09.2016:23:44:20;10.0;5.0
13.09.2016:23:44:30;20.0;6.0
13.09.2016:23:44:40;30.0;7.0
James Brown
  • 36,089
  • 7
  • 43
  • 59