0

I have a pandas series, s1, and I want to make a new series, s2, by applying a function that takes two inputs to create one new value. This function would be applied to a 2-value window on s1. The resulting series, s2, should have one fewer value than s1. There are many ways to accomplish this but I'm looking for a way to do it very efficiently. This is on Linux and I'm currently running python 2.7 and 3.4 and pandas 15.2, though I can update pandas if that's necessary. Here's a simplification of my problem. My series consists of musical pitches represented as strings.

import pandas
s1 = pandas.Series(['C4', 'E-4', 'G4', 'A-4')

I'd like to use this function:

def interval_func(event1, event2):
    ev1 = music21.note.Note(event1)
    ev2 = music21.note.Note(event2)
    intrvl = music21.interval.Interval(ev1, ev2)
    return intrvl.name

On s1 and a shifted version of s1, to get the following series:

s2 = pandas.Series(['m3', 'M3', 'm2'])
Alex
  • 2,154
  • 3
  • 26
  • 49
  • 3
    Incorrect `apply` can take a user func or lambda and can take 0-N params, you need to define your problem better with raw data, code and desired output – EdChum Apr 14 '16 at 20:17

1 Answers1

1

In response to your edit, we could try and use a similar .rolling method, but pandas does not currently support non-numeric types in rolls.

So, we can use a list comprehension:

[music21.interval.Interval(music21.note.Note(s1[i]),\
                           music21.note.Note(s1[i + 1])).name\
 for i in range(len(s1)-1)]

or, an apply:

import music21
import pandas as pd
import numpy as np

s1 = pd.Series(['C4', 'E-4', 'G4', 'A-4'])
df = pd.DataFrame({0:s1, 1:s1.shift(1)})

def myfunc(x):
    if not any([pd.isnull(x[0]), pd.isnull(x[1])]):
        return music21.interval.Interval(music21.note.Note(x[0]),music21.note.Note(x[1])).name


df.apply(myfunc, axis = 1)

nb, I would be surprised if the apply is any faster than the comprehension

jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • This solution would definitely work but the runtime will still be directly linked to the length of the series. I was hoping to make use of .apply() or something similar to arrive at a solution more endemic to the pandas library. – Alex Apr 14 '16 at 21:30
  • You were right, the runtime is basically the same for the two implementations. – Alex Apr 19 '16 at 12:58