1

I have a pandas DataFrame that has measured chemical concentrations. Each observation is timestamped, and they're all reported separately. I want to organise this better by having an row for each timestamp with all of the chemical measurements in that row. If there isn't a measurement for a chemical during that time, I'd like to stick a placeholder -999 in that spot.

I do have something working, but it's just a giant for loop and it's both incredibly inefficient and makes me feel dirty. I've looked through the documentation and nothing there has been very helpful so far. (Please do keep in mind that I'm very new at this before you judge me too much!)

def f(string):
    if og_data.loc[(og_data.local_time == time)&...
    (og_data.parameter == string), 'value'].empty:
        return -999
    else:
        return(og_data.loc[(og_data.local_time == time)&...
        (og_data.parameter == string), 'value'].array[0])

for time in local_time:
    h2so4.append(f('h2so4'))
    naoh.append(f('naoh'))
    o2.append(f('o2'))
    pbno3.append(f('pbno3'))

new_data['time'] = local_time
new_data['h2so4'] = h2so4
new_data['naoh'] = naoh
new_data['o2'] = o2
new_data['pbno3'] = pbno3

What I'd like is for this:

time chemical value
12:00  pbno3  1.5
12:00  h2so4  7
12:00  naoh  30
12:15  o2  25
12:15  pbno3  2
12:30  naoh  27

to become this:

time  h2so4 naoh o2 pbno3
12:00  7     30  -999  1.5
12:15  -999  -999  25  2
12:30  -999  27  -999  -999

Thank you!

Cam Phelan
  • 11
  • 1

0 Answers0