I'm using/learning Pandas
to load a csv style dataset where I have a time column that can be used as index. The data is sampled roughly at 100Hz. Here is a simplified snippet of the data:
Time (sec) Col_A Col_B Col_C
0.0100 14.175 -29.97 -22.68
0.0200 13.905 -29.835 -22.68
0.0300 12.257 -29.32 -22.67
... ...
1259.98 -0.405 2.205 3.825
1259.99 -0.495 2.115 3.735
There are 20 min of data, resulting in about 120,000 rows at 100 Hz. My goal is to select those rows within a certain time range, say 100-200 sec.
Here is what I've figured out
import panda as pd
df = pd.DataFrame(my_data) # my_data is a numpy array
df.set_index(0, inplace=True)
df.columns = ['Col_A', 'Col_B', 'Col_C']
df.index = pd.to_datetime(df.index, unit='s', origin='1900-1-1') # the date in origin is just a space-holder
My dataset doesn't include the date. How to avoid setting a fake date like I did above? It feels wrong, and also is quite annoying when I plot the data against time.
I know there are ways to remove date from the datatime object like here.
But my goal is to select some rows that are in a certain time range, which means I need to use pd.date_range()
. This function does not seem to work without date.
It's not the end of the world if I just use a fake date throughout my project. But I'd like to know if there are more elegant ways around it.