I have a pandas DataFrame that has measured chemical concentrations. Each observation is timestamped, and they're all reported separately. I want to organise this better by having an row for each timestamp with all of the chemical measurements in that row. If there isn't a measurement for a chemical during that time, I'd like to stick a placeholder -999 in that spot.
I do have something working, but it's just a giant for loop and it's both incredibly inefficient and makes me feel dirty. I've looked through the documentation and nothing there has been very helpful so far. (Please do keep in mind that I'm very new at this before you judge me too much!)
def f(string):
if og_data.loc[(og_data.local_time == time)&...
(og_data.parameter == string), 'value'].empty:
return -999
else:
return(og_data.loc[(og_data.local_time == time)&...
(og_data.parameter == string), 'value'].array[0])
for time in local_time:
h2so4.append(f('h2so4'))
naoh.append(f('naoh'))
o2.append(f('o2'))
pbno3.append(f('pbno3'))
new_data['time'] = local_time
new_data['h2so4'] = h2so4
new_data['naoh'] = naoh
new_data['o2'] = o2
new_data['pbno3'] = pbno3
What I'd like is for this:
time chemical value
12:00 pbno3 1.5
12:00 h2so4 7
12:00 naoh 30
12:15 o2 25
12:15 pbno3 2
12:30 naoh 27
to become this:
time h2so4 naoh o2 pbno3
12:00 7 30 -999 1.5
12:15 -999 -999 25 2
12:30 -999 27 -999 -999
Thank you!