1

I have a list of dictionaries, with spectral data inside the responses field. I also have array of wavelengths for labelling the columns for the spectral data. The list/input looks like this:

data = [  {
    'date': '2018-01-01',
    'measurement': 100,
    'responses': [(1, 1, np.array([1, 2, 3])),
                (2, 1, np.array([4, 5, 6])),
               ]
   },
   {
    'date': '2018-01-02',
    'measurement': 200,
    'responses': [(3, 1,np.array([5, 6, 7])),
                (4, 1, np.array([8, 9, 10])),
               ]
   },
]

And the column names for wavelengths to match:

wavelengths = [400,401,402]

I would like to convert this list to two pandas dataframes;

  1. One where the array in the response tuples are averaged, and
  2. One where they are separate, where the first to numbers in the tuple in responses are included.

The desired output for both is below:

__Average Dataframe__
index | date           | measurement | 400   | 401   | 402   |
0     | '2018-01-01'   | 100         | 2.5   | 3.5   | 4.5   |
1     | '2018-01-02'   | 200         | 6.5   | 7.5   | 8.5   |


__Seperate Dataframe__
index | date           | measurement | prong | scan| 400 | 401 | 402 |
0     | '2018-01-01'   | 100         | 1     | 1   | 1   | 2   | 3   |
1     | '2018-01-01'   | 100         | 2     | 1   | 4   | 5   | 6   |
2     | '2018-01-02'   | 200         | 3     | 1   | 5   | 6   | 7   |
3     | '2018-01-02'   | 200         | 4     | 1   | 8   | 9   | 10  |

What is the most efficient way to do this in pandas?

Dave
  • 454
  • 1
  • 7
  • 17
  • Please provide some reproducible input and expected output of it. – Chris Mar 15 '19 at 04:22
  • Hi Chris, how would you like this formatted? – Dave Mar 15 '19 at 04:30
  • For starters, you can provide a valid `dict`. It is just a format not a proper `dict`. And since having 1100 floats or 3 floats would require similar functionality, you can provide shorter `responses` with pseudo values, perhaps much shorter one, for much precise example. Going through [Minimal, Complete and Verifiable example](http://stackoverflow.com/help/mcve) can help you making a better question :) – Chris Mar 15 '19 at 04:37
  • @Chris Thanks Chris, I've updated the question, hopefully, this is more readable! – Dave Mar 15 '19 at 04:58

1 Answers1

0

This might not be the ideal solution, but using pandas:

import pandas as pd

wavelengths= [400, 401,402]
cols = ['prong', 'scan'] + wavelengths 

df1 = pd.concat([pd.DataFrame(dict([(k, pd.Series(v)) for k, v in d.items()])) for d in data])
df1 = df1.reset_index(drop=True).ffill()
df2 = pd.DataFrame([(x,y,*z) for x,y,z in df1['responses']], columns= cols)

df = pd.concat([df1.drop('responses', 1), df2], 1)
# Separate dataset
         date  measurement  prong  scan  400  401  402
0  2018-01-01        100.0      1     1    1    2    3
1  2018-01-01        100.0      2     1    4    5    6
2  2018-01-02        200.0      3     1    5    6    7
3  2018-01-02        200.0      4     1    8    9   10

df.groupby(['date', 'measurement'])[wavelengths].mean().reset_index()
# Average dataset
         date  measurement  400  401  402
0  2018-01-01        100.0  2.5  3.5  4.5
1  2018-01-02        200.0  6.5  7.5  8.5
Chris
  • 29,127
  • 3
  • 28
  • 51