For each unique timestamp in df, make vector of all rows in dataframe (Python)

Question

How do I create a vector that contains multiple sensor values for each time stamp. The desired output in this subset of the data would be to group the three first rows since they have the exact same timestamp like: ([21, 0, 0, 5236],[6, 6, 0, 58],[18, 1, 0, 1770]) and then for the next timestamp etc.

Also, this has to be done without a for-loop since it's almost one million rows.

`print(df.to_dict())` and paste the output in your question, don't use images as they aren't reproducible. — Umar.H, Mar 10 '21 at 13:58
Please don't post image of a dataframe, provide dataframe in text format so that people can copy and try their logic. — Epsi95, Mar 10 '21 at 13:59
When I use print(df.to_dict()) I get something like: {'Report_Time': {5813: Timestamp('2021-02-04 11:03:34'), 5823: Timestamp('2021-02-04 11:03:34'), 5824: How do I make it look like a df in my post? — Dina Stack, Mar 10 '21 at 14:22
the goal is to make your post replicable, if you post that output we can copy that into our IDEs and create your dataframe to test any solutions. you can just add it as one line — Umar.H, Mar 10 '21 at 14:49
awesome you're very close to a [mcve] now you just need to add a sample of what you want your data to look like. — Umar.H, Mar 10 '21 at 15:52
@ShubhamSharma please go ahead! you never need permission for good answers ;) — Umar.H, Mar 10 '21 at 16:41

Shubham Sharma · Answer 1 · 2021-03-10T17:27:52.447

IIUC, you can group the dataframe on Report_Time, then for each group corresponding to unique Timestamp you can create a mapping from Timestamp to the required array obtained from the columns A, B, Type and Meter_Value inside the dict comprehension:

cols = ['A', 'B', 'Type', 'Meter_Value']
info = {k: g[cols].to_numpy() for k, g in df.groupby('Report_Time')}

To access the arrays corresponding to a unique Timestamp you can use dictionary lookup:

>>> info[pd.Timestamp('2021-02-04 11:03:34')]

array([[21, 0, 0, '5236'],
       [6, 6, 0, '58'],
       [18, 1, 0, '1770'],
       [21, 0, 0, '5237']], dtype=object)

>>> info[pd.Timestamp('2021-02-04 11:03:35')]

array([[6, 6, 0, '57'],
       [19, 2, 0, '1732'],
       [21, 0, 0, '5238'],
       [18, 1, 0, '1769']], dtype=object)

Thank You! This works as desired. Basically I have a single sensor-value on each row, but the amount of rows is not consistent for unique timestamps. So I had an idea to remove seconds and look at all logged sensor-values/rows for each minute and maybe do dimensionality reduction for each minute. Do you know if this is possible even though the amount of rows will be different for each minute? — Dina Stack, Mar 11 '21 at 09:42

score 0 · Answer 2 · answered Mar 10 '21 at 14:13

0

Check if the Timestamp changed, if it changed start to collect the rows in variables, afterwards get them vectored with as.vector(t(Dataframevariable)). this post might help you

answered Mar 10 '21 at 14:13

HugosBanane

21
9

Golden Lion · Answer 3 · 2021-03-13T17:49:23.643

 dct = {'Report_Time': {5813: pd.Timestamp('2021-02-04 11:03:34'), 5823: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5824: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5825: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5829: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5830: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5831: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5839: 
                   pd.Timestamp('2021-02-04 11:03:35')}, 
   'Subsystem': {5813: 0, 5823: 0, 5824: 0, 5825: 0, 5829: 0, 5830: 0, 5831: 0, 5839: 0}, 'A': {5813: 21, 5823: 6, 5824: 18, 5825: 21, 5829: 6, 5830: 19, 5831: 21, 5839: 18}, 'B': {5813: 0, 5823: 6, 5824: 1, 5825: 0, 5829: 6, 5830: 2, 5831: 0, 5839: 1}, 'Type': {5813: 0, 5823: 0, 5824: 0, 5825: 0, 5829: 0, 5830: 0, 5831: 0, 5839: 0}, 'Meter_Value': {5813: '5236', 5823: '58', 5824: '1770', 5825: '5237', 5829: '57', 5830: '1732', 5831: '5238', 5839: '1769'}}

df = pd.DataFrame(dct)
print(df.columns)


grouped=df.groupby('Report_Time').agg(lambda x: x.tolist())

results=[ (x.index,key, list(x))  for key,x in grouped.iteritems()]
print(results)

output:

[(DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Subsystem', [[0, 0, 0, 0], [0, 0, 0, 0]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'A', [[21, 6, 18, 21], [6, 19, 21, 18]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'B', [[0, 6, 1, 0], [6, 2, 0, 1]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Type', [[0, 0, 0, 0], [0, 0, 0, 0]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Meter_Value', [['5236', '58', '1770', '5237'], ['57', '1732', '5238', '1769']])]

In [ ]:

 tuples=[]
 for my_tuples in results:
     (dates,key,data)=my_tuples
     for i in range(len(dates)):
         print(dates[i],key,data[i])
         tuples.append((dates[i],key,data[i]))
for a_tuple in tuples:
    print(a_tuple)

output:

    Index(['Report_Time', 'Subsystem', 'A', 'B', 'Type', 'Meter_Value'], dtype='object')
   (Timestamp('2021-02-04 11:03:34'), 'Subsystem', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:35'), 'Subsystem', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:34'), 'A', [21, 6, 18, 21])
   (Timestamp('2021-02-04 11:03:35'), 'A', [6, 19, 21, 18])
   (Timestamp('2021-02-04 11:03:34'), 'B', [0, 6, 1, 0])
   (Timestamp('2021-02-04 11:03:35'), 'B', [6, 2, 0, 1])
   (Timestamp('2021-02-04 11:03:34'), 'Type', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:35'), 'Type', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:34'), 'Meter_Value', ['5236', '58', '1770', '5237'])
   (Timestamp('2021-02-04 11:03:35'), 'Meter_Value', ['57', '1732', '5238', '1769'])

you can now use a list comprehension or other type of list tuple filtering to get the desired data. — Golden Lion, Mar 13 '21 at 17:53

For each unique timestamp in df, make vector of all rows in dataframe (Python)

3 Answers3