1

I have an excel sheet with values representing start and end_time of a time series data, as shown below. Times are in seconds.

+------------+---------+-------+
Start_Time   End_Time  Value
0            2      A
2            3      B
3            9      A
9            11     C

I want to extrapolate the values between start and end_time and display the values for each second.

+---------+------+ Time Value 0 A 1 A 2 A 3 B 4 A 5 A 6 A 7 A 8 A 9 A 10 C 11 c

Any help to implement it in Python will be appreciated. Thanks.

Ram
  • 41
  • 6

1 Answers1

2

Setup

You should find how to read your excel sheet with pandas easily, and options will depend on the file itself, so I won't cover this part. Below is the reproduction of your sample dataframe, used for the example.

import pandas as pd

df = pd.DataFrame({'Start_Time': [0, 2, 3, 9],
                   'End_Time': [2, 3, 9, 11],
                   'Value': ['A', 'B', 'A', 'C']})
>>> df
Out[]:
   End_Time  Start_Time Value
0         2           0     A
1         3           2     B
2         9           3     A
3        11           9     C

Solution

(pd.Series(range(df.End_Time.max() + 1), name='Value')  # Create a series on whole range
   .map(df.set_index('End_Time').Value)                 # Set values from "df"
   .bfill()                                             # Backward fill NaNs values
   .rename_axis('Time'))                                # Purely cosmetic axis rename
Out[]:
Time
0     A
1     A
2     A
3     B
4     A
5     A
6     A
7     A
8     A
9     A
10    C
11    C
Name: Value, dtype: object

Walkthrough

Create the whole "Time" range

s = pd.Series(range(df.End_Time.max() + 1))

>>> s
Out[]:
0      0
1      1
2      2
3      3
4      4
5      5
6      6
7      7
8      8
9      9
10    10
11    11
dtype: int32

Use "End_Time" as index for df

>>> df.set_index('End_Time')
Out[]:
          Start_Time Value
End_Time
2                  0     A
3                  2     B
9                  3     A
11                 9     C

Map df values to corresponding "End_Time" values from s

s = s.map(df.set_index('End_Time').Value)

>>> s
Out[]:
0     NaN
1     NaN
2       A
3       B
4     NaN
5     NaN
6     NaN
7     NaN
8     NaN
9       A
10    NaN
11      C
dtype: object

Backward-fill the NaN values

s = s.bfill()

>>> s
Out[]:
0     A
1     A
2     A
3     B
4     A
5     A
6     A
7     A
8     A
9     A
10    C
11    C
dtype: object

Then rename_axis('Time') only renames the series axis to match your desired output.

Note that this works here because you use excluding Start_Time. If you were using including Start_Time (where Value really starts at Start_Time, which is more common) you should change End_Time to Start_Time and bfill() to ffill() (forward-fill).

FabienP
  • 3,018
  • 1
  • 20
  • 25
  • Thanks for the answer. +1 for the walkthrough. Yes, I know how to use pandas to read excel. – Ram Oct 12 '17 at 05:19