3

How do I create a vector that contains multiple sensor values for each time stamp. The desired output in this subset of the data would be to group the three first rows since they have the exact same timestamp like: ([21, 0, 0, 5236],[6, 6, 0, 58],[18, 1, 0, 1770]) and then for the next timestamp etc.

Also, this has to be done without a for-loop since it's almost one million rows.

Dina Stack
  • 41
  • 4
  • `print(df.to_dict())` and paste the output in your question, don't use images as they aren't reproducible. – Umar.H Mar 10 '21 at 13:58
  • Please don't post image of a dataframe, provide dataframe in text format so that people can copy and try their logic. – Epsi95 Mar 10 '21 at 13:59
  • When I use print(df.to_dict()) I get something like: {'Report_Time': {5813: Timestamp('2021-02-04 11:03:34'), 5823: Timestamp('2021-02-04 11:03:34'), 5824: How do I make it look like a df in my post? – Dina Stack Mar 10 '21 at 14:22
  • the goal is to make your post replicable, if you post that output we can copy that into our IDEs and create your dataframe to test any solutions. you can just add it as one line – Umar.H Mar 10 '21 at 14:49
  • 1
    I've done that now, thank you. @Manakin – Dina Stack Mar 10 '21 at 15:47
  • awesome you're very close to a [mcve] now you just need to add a sample of what you want your data to look like. – Umar.H Mar 10 '21 at 15:52
  • 2
    @Manakin would you mind if i answer the question :) – Shubham Sharma Mar 10 '21 at 16:12
  • 2
    @ShubhamSharma please go ahead! you never need permission for good answers ;) – Umar.H Mar 10 '21 at 16:41

3 Answers3

3

IIUC, you can group the dataframe on Report_Time, then for each group corresponding to unique Timestamp you can create a mapping from Timestamp to the required array obtained from the columns A, B, Type and Meter_Value inside the dict comprehension:

cols = ['A', 'B', 'Type', 'Meter_Value']
info = {k: g[cols].to_numpy() for k, g in df.groupby('Report_Time')}

To access the arrays corresponding to a unique Timestamp you can use dictionary lookup:

>>> info[pd.Timestamp('2021-02-04 11:03:34')]

array([[21, 0, 0, '5236'],
       [6, 6, 0, '58'],
       [18, 1, 0, '1770'],
       [21, 0, 0, '5237']], dtype=object)

>>> info[pd.Timestamp('2021-02-04 11:03:35')]

array([[6, 6, 0, '57'],
       [19, 2, 0, '1732'],
       [21, 0, 0, '5238'],
       [18, 1, 0, '1769']], dtype=object)
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • Thank You! This works as desired. Basically I have a single sensor-value on each row, but the amount of rows is not consistent for unique timestamps. So I had an idea to remove seconds and look at all logged sensor-values/rows for each minute and maybe do dimensionality reduction for each minute. Do you know if this is possible even though the amount of rows will be different for each minute? – Dina Stack Mar 11 '21 at 09:42
0

Check if the Timestamp changed, if it changed start to collect the rows in variables, afterwards get them vectored with as.vector(t(Dataframevariable)). this post might help you

0
 dct = {'Report_Time': {5813: pd.Timestamp('2021-02-04 11:03:34'), 5823: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5824: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5825: 
                   pd.Timestamp('2021-02-04 11:03:34'), 5829: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5830: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5831: 
                   pd.Timestamp('2021-02-04 11:03:35'), 5839: 
                   pd.Timestamp('2021-02-04 11:03:35')}, 
   'Subsystem': {5813: 0, 5823: 0, 5824: 0, 5825: 0, 5829: 0, 5830: 0, 5831: 0, 5839: 0}, 'A': {5813: 21, 5823: 6, 5824: 18, 5825: 21, 5829: 6, 5830: 19, 5831: 21, 5839: 18}, 'B': {5813: 0, 5823: 6, 5824: 1, 5825: 0, 5829: 6, 5830: 2, 5831: 0, 5839: 1}, 'Type': {5813: 0, 5823: 0, 5824: 0, 5825: 0, 5829: 0, 5830: 0, 5831: 0, 5839: 0}, 'Meter_Value': {5813: '5236', 5823: '58', 5824: '1770', 5825: '5237', 5829: '57', 5830: '1732', 5831: '5238', 5839: '1769'}}

df = pd.DataFrame(dct)
print(df.columns)


grouped=df.groupby('Report_Time').agg(lambda x: x.tolist())

results=[ (x.index,key, list(x))  for key,x in grouped.iteritems()]
print(results)

output:

[(DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Subsystem', [[0, 0, 0, 0], [0, 0, 0, 0]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'A', [[21, 6, 18, 21], [6, 19, 21, 18]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'B', [[0, 6, 1, 0], [6, 2, 0, 1]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Type', [[0, 0, 0, 0], [0, 0, 0, 0]]), (DatetimeIndex(['2021-02-04 11:03:34', '2021-02-04 11:03:35'], dtype='datetime64[ns]', name='Report_Time', freq=None), 'Meter_Value', [['5236', '58', '1770', '5237'], ['57', '1732', '5238', '1769']])]

In [ ]:

 tuples=[]
 for my_tuples in results:
     (dates,key,data)=my_tuples
     for i in range(len(dates)):
         print(dates[i],key,data[i])
         tuples.append((dates[i],key,data[i]))
for a_tuple in tuples:
    print(a_tuple)

output:

    Index(['Report_Time', 'Subsystem', 'A', 'B', 'Type', 'Meter_Value'], dtype='object')
   (Timestamp('2021-02-04 11:03:34'), 'Subsystem', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:35'), 'Subsystem', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:34'), 'A', [21, 6, 18, 21])
   (Timestamp('2021-02-04 11:03:35'), 'A', [6, 19, 21, 18])
   (Timestamp('2021-02-04 11:03:34'), 'B', [0, 6, 1, 0])
   (Timestamp('2021-02-04 11:03:35'), 'B', [6, 2, 0, 1])
   (Timestamp('2021-02-04 11:03:34'), 'Type', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:35'), 'Type', [0, 0, 0, 0])
   (Timestamp('2021-02-04 11:03:34'), 'Meter_Value', ['5236', '58', '1770', '5237'])
   (Timestamp('2021-02-04 11:03:35'), 'Meter_Value', ['57', '1732', '5238', '1769'])

Golden Lion
  • 3,840
  • 2
  • 26
  • 35