2

I am working on a python project where I have a .csv file like this:

freq,ae,cl,ota
825,1,2,3
835,4,5,6
850,10,11,12
880,22,23,24
910,46,47,48
960,94,95,96
1575,190,191,192
1710,382,383,384
1750,766,767,768

I need to get some data out of the file quick on the run.
To give an example:

I am sampling at a freq of 880MHz, I want to do some calculations on the samples, and make use of the data in the 880 row of the .csv file.

I did this by using the freq colon as indexing, and then just use the sampling freq to get the data, but the tricky part is, if I sample with 900MHz I get an error. I would like it to take the nearest data below and above, in this case 880 and 910, from these to rows I would use the data to make an linearized estimate of what the data at 900MHz would look like.

My main problem is how to do a quick search for the data, and if a perfect fit does not exists how to get the two nearest rows?

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
Laplace
  • 252
  • 2
  • 10
  • [related/a workaround](http://stackoverflow.com/questions/10464738/interoplation-on-dataframe-in-pandas)... but probably not what you want to do. – Andy Hayden May 17 '13 at 17:36
  • No not really, I do not want to add more data, the file work as reference to the sampling and calculations – Laplace May 17 '13 at 17:41

3 Answers3

3

Take the row/Series before and the row after

In [11]: before, after = df1.loc[:900].iloc[-1], df1.loc[900:].iloc[0]

In [12]: before
Out[12]:
ae     22
cl     23
ota    24
Name: 880, dtype: int64

In [13]: after
Out[13]:
ae     46
cl     47
ota    48
Name: 910, dtype: int64

Put an empty row in the middle and interpolate (edit: the default interpolation would just take the average of the two, so we need to set method='values'):

In [14]: sandwich = pd.DataFrame([before, pd.Series(name=900), after])

In [15]: sandwich
Out[15]:
     ae  cl  ota
880  22  23   24
900 NaN NaN  NaN
910  46  47   48

In [16]: sandwich.apply(apply(lambda col: col.interpolate(method='values'))
Out[16]:
     ae  cl  ota
880  22  23   24
900  38  39   40
910  46  47   48

In [17]: sandwich.apply(apply(lambda col: col.interpolate(method='values')).loc[900]
Out[17]:
ae     38
cl     39
ota    40
Name: 900, dtype: float64

Note:

df1 = pd.read_csv(csv_location).set_index('freq')

And you could wrap this in some kind of function:

def interpolate_for_me(df, n):
    if n in df.index:
        return df.loc[n]
    before, after = df1.loc[:n].iloc[-1], df1.loc[n:].iloc[0]
    sandwich = pd.DataFrame([before, pd.Series(name=n), after])
    return sandwich.apply(lambda col: col.interpolate(method='values')).loc[n]
Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • This was exactly what I was looking for thank you.. A side comment, I don't know how the interpolate function work, but the way you are using it, it assumes that the 900MHz is in the middle of the two other frequencies, so the result is not correct. – Laplace May 17 '13 at 18:11
  • @Laplace thanks for letting me know! I had assumed the default was linear (since it is labelled 'linear') but apparently 'values' is what we wanted (for linear interpolation). – Andy Hayden May 17 '13 at 18:32
0

The bisect module will perform bisection within a sorted sequence.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
0
import csv
import bisect

def interpolate_data(data, value):
    # check if value is in range of the data.
    if data[0][0] <= value <= data[-1][0]: 
        pos = bisect.bisect([x[0] for x in data], value)
        if data[pos][0] == value:
            return data[pos][0]
        else:
            prev = data[pos-1]
            curr = data[pos]
            factor = 1+(value-prev[0])/(curr[0]-prev[0])
            return [value]+[x*factor for x in prev[1:]]

with open("data.csv", "rb") as csvfile:
    f = csv.reader(csvfile)
    f.next() # remove the header
    data = [[float(x) for x in row] for row in f] # convert all to float

# test value 1200:
interpolate_data(data, 1200)
# = [1200, 130.6829268292683, 132.0731707317073, 133.46341463414632]

Works for me and is fairly easy to understand.

michaelkrisper
  • 707
  • 6
  • 8