2

tsfresh needs input data in a specific column. I initially assumed that column_id is just row_index but I fear it's wrong.

I have sensor data - pressure sensor, temperature sensor and humidity sensor being captured at 10 sec interval. Thus it's 4 column pandas DataFrame. Now tell me how shuld the data be used like ? What is column id ?

The documentation is good here but just that I'm not able to understand what they mean by entity. Each sensor measures a distinct thing and all are installed in a machine unit.

joel.wilson
  • 8,243
  • 5
  • 28
  • 48

2 Answers2

1

The source code sheds some light on this ciphertext:

tsfresh/feature_extraction/extraction.py:76:

:param column_id: The name of the id column to group by.
:type column_id: str

So, this is a column that should have the same value for all points of a time series. If there are multiple values in this column in the dataframe, the lib will interpret it as multiple time series and analyze them all at the same time.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • Can you explai this with an example. For example sales data. I have 2 columns : `Time` column at `day` level and `Sales` data . Would yo give a different `id` value to different days? – joel.wilson Aug 18 '18 at 21:56
  • @joel.wilson AFAICS, `tsfresh` requires one more column in addition to `Time` and `Sales`. If you only have one series, it should just have the same value for all points. https://tsfresh.readthedocs.io/en/latest/text/quick_start.html has an example. – ivan_pozdeev Aug 18 '18 at 22:06
  • 2
    This example is the most confusing example. If you give id column a single value then output will have just 1 row. That means 1 time series gets converted to 1 row – joel.wilson Aug 18 '18 at 22:11
0

This column indicates which entities the time series belong to. Features will be extracted individually for each entity. The resulting feature matrix will contain one row per entity. In the example proposed in the documentation, you have values for 6 sensors of different robots at different times. In this example, each robot is a different entity, so each of it has a different id.

Or if you have data of different vendors and the number of items they sell in different categories at different time stamps the vendor id can be used as your "column_id".

Moniba
  • 789
  • 10
  • 17