-2

I've got some semicolon separated data. The first column shows fixed time steps. In the second and third column you can see data which is partially incomplete:

Input.txt

14.09.2016:00:00:00;;100
14.09.2016:00:00:01;-1;
14.09.2016:00:00:02;0;300
14.09.2016:00:00:03;;
14.09.2016:00:00:04;;
14.09.2016:00:00:05;;
14.09.2016:00:00:06;4;
14.09.2016:00:00:07;;
14.09.2016:00:00:08;;
14.09.2016:00:00:09;16;307

How can I do a local linear interpolation for each column between those data points with empty values using awk or gawk?:

Output.txt

14.09.2016:00:00:00;-2;100
14.09.2016:00:00:01;-1;200
14.09.2016:00:00:02;0;300
14.09.2016:00:00:03;1;301
14.09.2016:00:00:04;2;302
14.09.2016:00:00:05;3;303
14.09.2016:00:00:06;4;304
14.09.2016:00:00:07;8;305
14.09.2016:00:00:08;12;306
14.09.2016:00:00:09;16;307

There already is an gawk script which only does a global interpolation for each column over the first an the last data point available here: Using awk to interpolate data column based in a data file with date and time

Community
  • 1
  • 1
  • 3
    You got a nice answer there. Couldn't you use it? What about providing feedback to it? Please share your efforts or update the original question. – fedorqui Jan 13 '17 at 12:43
  • 2
    Possible duplicate of [Using awk to interpolate data column based in a data file with date and time](http://stackoverflow.com/questions/39792172/using-awk-to-interpolate-data-column-based-in-a-data-file-with-date-and-time) – Jose Ricardo Bustos M. Jan 13 '17 at 13:09
  • why after 4 its 8 , 12 and then 16 in second column ? here is one example http://www.unix.com/unix-for-dummies-questions-and-answers/247167-interpolation-if-there-no-exact-match-value-2.html – Akshay Hegde Jan 13 '17 at 13:20

2 Answers2

0

Considering linear time, values in your data do not appear linear. If you still want to use linear interpolation, you should chop your data into pieces, use for example this for each piece and combine the pieces again. Finding the right pieces seem like another problem, maybe just look for values in data column, once you find the second value, cut after it, and continue from that particular line again, like this (considering only the first data column ($2):

14.09.2016:00:00:00;;100
14.09.2016:00:00:01;-1;
14.09.2016:00:00:02;0;300

14.09.2016:00:00:02;0;300
14.09.2016:00:00:03;;
14.09.2016:00:00:04;;
14.09.2016:00:00:05;;
14.09.2016:00:00:06;4;

14.09.2016:00:00:06;4;
14.09.2016:00:00:07;;
14.09.2016:00:00:08;;
14.09.2016:00:00:09;16;307

When considering the second data column (last field, $3) you can (must) combine the second and third piece.

Also, read this.

Community
  • 1
  • 1
James Brown
  • 36,089
  • 7
  • 43
  • 59
0

That's stuff which is very very complicated. Is there an alternative option just to fill the empty fields with previous non-empty column value?

Input.txt

14.09.2016:00:00:00;;100
14.09.2016:00:00:01;-1;
14.09.2016:00:00:02;0;300
14.09.2016:00:00:03;;
14.09.2016:00:00:04;;
14.09.2016:00:00:05;;
14.09.2016:00:00:06;4;
14.09.2016:00:00:07;;
14.09.2016:00:00:08;;
14.09.2016:00:00:09;16;307

Output.txt

14.09.2016:00:00:00;;100
14.09.2016:00:00:01;-1;100
14.09.2016:00:00:02;0;300
14.09.2016:00:00:03;0;300
14.09.2016:00:00:04;0;300
14.09.2016:00:00:05;0;300
14.09.2016:00:00:06;4;300
14.09.2016:00:00:07;4;300
14.09.2016:00:00:08;4;300
14.09.2016:00:00:09;16;307

I've just found a solutions that works for fixed column widths awk to Fill Empty Column value with Previous Non-Empty Column value: but not in that case with semicolon separated files with date and time.

Community
  • 1
  • 1