2

Sorry for the long question, maybe there is a quick solution but I haven't been able to figure this out. I have 3 different datasets with x and y datapoints like the following:

line1_x = [277.75, 287.68, 308.77, 322.23, 342.98, 363.99, 387.59, 396.83, 401.52, 405.78, 408.30,
           408.06, 406.94, 406.11, 403.46, 401.14, 397.28, 394.04, 390.62, 386.99, 384.31, 380.41,
           375.85, 372.25, 368.35, 364.86, 363.01, 361.84, 361.85, 363.79, 366.25, 368.94, 371.58,
           376.11, 380.97, 387.33, 391.61, 392.36, 373.35, 350.68]

line1_y = [806.00, 779.43, 713.71, 731.96, 701.13, 621.68, 525.04, 490.21, 467.48, 455.57, 445.59,  
           440.54, 436.74, 432.71, 434.32, 438.06, 449.00, 462.25, 476.56, 493.43, 514.02, 538.52,  
           570.87, 614.20, 664.00, 708.06, 750.61, 786.38, 814.16, 834.67, 853.45, 864.71, 872.50,  
           882.23, 893.28, 907.94, 922.18, 931.05, 950.03, 1085.45]

line2_x = [443.98, 442.71, 441.24, 437.57, 427.09, 417.82, 418.20, 418.14, 417.76, 414.84, 411.93, 
           408.14, 404.29, 402.48, 400.63, 398.01, 394.43, 392.05, 391.69, 391.06, 388.61, 384.59, 
           378.93, 374.67, 372.82, 371.07, 370.50, 370.96, 372.11, 374.10, 377.05, 381.65, 385.54, 
           388.72, 389.27, 389.00, 389.71, 391.49, 392.60, 385.89, 384.39, 361.87]

line2_y = [299.48, 317.04, 338.92, 360.55, 405.99, 451.64, 493.67, 516.66, 530.73, 540.62, 548.90,
           553.39, 555.65, 559.21, 564.41, 571.17, 577.56, 585.31, 592.54, 606.96, 626.18, 651.76,  
           679.82, 710.62, 744.39, 774.02, 802.14, 829.54, 849.98, 865.51, 879.94, 894.02, 903.30,  
           910.80, 918.08, 926.39, 935.80, 947.09, 955.08, 965.04, 1076.83, 1110.45]

line3_x = [302.52, 313.15, 340.68, 352.96, 364.85, 378.51, 407.05, 437.33, 453.73, 462.99, 467.79, 
           470.68, 472.10, 473.56, 473.49, 472.72, 471.14, 468.91, 467.76, 466.58, 463.90, 460.31,
           457.23, 453.48, 448.68, 443.65, 438.08, 433.63, 429.14, 424.09, 418.62, 415.17, 414.60, 
           417.55, 422.72, 429.17, 436.35, 443.75, 450.29, 455.66, 459.26, 461.39, 462.30, 463.23, 
           469.86, 435.97]

line3_y = [794.42, 809.18, 782.48, 761.93, 771.51, 776.51, 689.07, 560.78, 531.03, 524.44, 518.59, 
           516.73, 514.03, 511.97, 511.63, 518.31, 532.60, 549.91, 563.77, 582.81, 601.98, 617.95, 
           628.03, 640.11, 658.30, 679.82, 703.00, 730.40, 754.16, 776.00, 800.85, 824.39, 844.47, 
           856.76, 866.70, 875.72, 884.18, 892.66, 903.61, 912.93, 922.55, 928.60, 932.00, 952.27, 
           1029.70, 1065.12]

This dataset is a sequence of points, so time and location is important and that is why it is not possible to simply add points to the smaller datasets and calculate the average. I've been trying to interpolate the data so all the interpolated lines have the same distance and then using that to calculate the average for each step of the interpolated lines. The data looks like this:

enter image description here

I calculated the interpolation using the minimum y and maximum y of the 3 datasets using the interp1d function:

interpolated_y = np.linspace(minimum_y, maximum_y, num=100, endpoint=True)
interpolated_x = interp1d(line1_y,line1_x, fill_value="extrapolate")(interpolated_y)

And they look like this:

enter image description here

This is good because all the distances between the steps of the interpolated points are the same for all the lines and there are exactly the same amount of interpolated points, but if I use this to calculate the average between the interpolation steps I lose the information about the starting points, i.e. I am not able to calculate the average between the starting points.

My question is: How can I interpolate the 3 datasets such that I have the same number of interpolated points with the same distance between points for each of them?, and the interpolation should start from the starting points and continue in the correct direction. For the cases where the dataset is smaller, an extrapolation should happen.

Or, is there another way to calculate an average of those points taking into account that they are a time sequence points?

  • Interpolation, in mathematics, is defined as the determination or estimation of the value of f(x), or a function of x, from certain known values of the function. Please explain how you interpolate across three different data sets? Are the three sets, different estimates of the same function or of different functions? – itprorh66 Jul 09 '21 at 17:30
  • @itprorh66 The three datasets are just coordinates, each point occurs in a defined delta time. I just need a way to average them taking into account that each dataset can be different in lenght. Since I want to get the average, I created 100 equidistant points in the y axis from the minimum to the maximum value of y, then I created a Linear interpolation function per dataset with it's corresponding values, and I gave those functions the created y points to get the interpolated lines as stated in the code: `interpolated_x = interp1d(line1_y,line1_x, fill_value="extrapolate")(interpolated_y)` – Martin MeRu Jul 12 '21 at 07:53
  • @itprorh66 the problem is that if I do the average in this way, I am not averaging the points in the correct order, because they start in different positions, so I can't really average them in this way, I need some kind of equidistant interpolation with same number of points that starts at the starting points and ends in the maximum value of y, so I can take average of them taking into account the order of appearance of the points. – Martin MeRu Jul 12 '21 at 07:59

0 Answers0