5

Hi Everyone,

I've been looking to Stackoverflow for couple of years, and it helped me a lot, so much that I never have to register before :)

But today I'm stuck on a problem using Python with Pandas and Quantities (could be unum or pint as well). I try to do my best to make a clear post, but since it's my first one, I apologize if something is confusing and will try to correct any mistake you'll find :)


I want to import data from a source and build a Pandas dataframe as follow:

import pandas as pd
import quantities as pq

depth = [0.0,1.1,2.0] * pq.m
depth2 = [0,1,1.1,1.5,2] * pq.m

s1 = pd.DataFrame(
        {'depth' : [x for x in depth]},
        index = depth)

This gives:

S1=
     depth
0.0  0.0 m
1.1  1.1 m
2.0  2.0 m

Now I want to extend the data to the depth2 values: (obviously there is not point to interpolate depth over depth, but it's a test before it gets more complicated).

s2 = s1.reindex(depth2)

This gives:

S2=
      depth
0.0   0.0 m
1.0   NaN
1.1   1.1 m
1.5   NaN
2.0   2.0 m

So far no problem.


But when I try to interpolate the missing values doing:

s2['depth'].interpolate(method='values')

I got the following error:

C:\Python27\lib\site-packages\numpy\lib\function_base.pyc in interp(x, xp, fp, left, right)
   1067         return compiled_interp([x], xp, fp, left, right).item()
   1068     else:
-> 1069         return compiled_interp(x, xp, fp, left, right)
  1070 
  1071 
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

I understand that interpolation from numpy does not work on object.


But if I try now to interpolate the missing values by dropping the units, it works:

s3 = s2['depth'].astype(float).interpolate(method='values')

This gives:

s3 = 
0.0   0
1.0   1
1.1   1.1
1.5   1.5
2.0   2
Name: depth, dtype: object

How can I get back the unit in the depth column?

I can't find any trick to put back the unit...

Any help will be greatly appreciated. Thanks

Julien
  • 231
  • 4
  • 18

2 Answers2

2

Here's a way to do what you want.

Split apart the quantities and create a set of 2 columns for each quantity

In [80]: df = concat([ col.apply(lambda x: Series([x.item(),x.dimensionality.string],
                       index=[c,"%s_unit" % c])) for c,col in s1.iteritems() ])

In [81]: df
Out[81]: 
     depth depth_unit
0.0    0.0          m
1.1    1.1          m
2.0    2.0          m

In [82]: df = df.reindex([0,1.0,1.1,1.5,2.0])

In [83]: df
Out[83]: 
     depth depth_unit
0.0    0.0          m
1.0    NaN        NaN
1.1    1.1          m
1.5    NaN        NaN
2.0    2.0          m

Interpolate

In [84]: df['depth'] = df['depth'].interpolate(method='values')

Propogate the units

In [85]: df['depth_unit'] = df['depth_unit'].ffill()

In [86]: df
Out[86]: 
     depth depth_unit
0.0    0.0          m
1.0    1.0          m
1.1    1.1          m
1.5    1.5          m
2.0    2.0          m
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Thanks Jeff for your answer. I will see how I can implement this, because I will have several columns with different parameters and units. I still looking is I can find a way to get a pandas dataframe with units into, after interpolation. Maybe I have to build an intermediate dataframe with non unit and build a final dataframe with interpolated values and units. – Julien Oct 09 '13 at 13:50
  • yep...this has been an issue w.r.t the quantities library for quite some time. Its not trivial to carry around this type of meta data w/o major revisions. But if you come up with a nice solutions pls post on github. – Jeff Oct 09 '13 at 13:58
  • Thanks Jeff, I added my solution below. Not sure if it's pythonic enough for github :) I'm very new in python, but love it. – Julien Oct 15 '13 at 13:55
0

Ok I found a solution, might not be the best one, but for my problem it works just fine:

import pandas as pd
import quantities as pq

def extendAndInterpolate(input, newIndex):
""" Function to extend a panda dataframe and interpolate
"""
output = pd.concat([input, pd.DataFrame(index=newIndex)], axis=1)

for col in output.columns:
    # (1) Try to retrieve the unit of the current column
    try:
        # if it succeeds, then store the unit
        unit = 1 * output[col][0].units    
    except Exception, e:
        # if it fails, which means that the column contains string
        # then return 1
        unit = 1

    # (2) Check the type of value.
    if isinstance(output[col][0], basestring):
        # if it's a string return the string and fill the missing cell with this string
        value = output[col].ffill()
    else:
        # if it's a value, to be able to interpolate, you need to:
        #   - (a) dump the unit with astype(float)
        #   - (b) interpolate the value
        #   - (c) add again the unit
        value = [x*unit for x in output[col].astype(float).interpolate(method='values')]
    #
    # (3) Returned the extended pandas table with the interpolated values    
    output[col] = pd.Series(value, index=output.index)
# Return the output dataframe
return output

Then:

depth = [0.0,1.1,2.0] * pq.m
depth2 = [0,1,1.1,1.5,2] * pq.m

s1 = pd.DataFrame(
        {'depth' : [x for x in depth]},
        index = depth)

s2 = extendAndInterpolate(s1, depth2)

The result:

s1
     depth
0.0  0.0 m
1.1  1.1 m
2.0  2.0 m

s2     
     depth
0.0  0.0 m
1.0  1.0 m
1.1  1.1 m
1.5  1.5 m
2.0  2.0 m

Thanks for you help.

Julien
  • 231
  • 4
  • 18