There are few more ways to apply a function on every row of a DataFrame.
(1) You could modify EOQ
a bit by letting it accept a row (a Series object) as argument and access the relevant elements using the column names inside the function. Moreover, you can pass arguments to apply
using its keyword, e.g. ch
or ck
:
def EOQ1(row, ck, ch):
Q = math.sqrt((2*row['D']*ck)/(ch*row['p']))
return Q
df['Q1'] = df.apply(EOQ1, ck=ck, ch=ch, axis=1)
(2) It turns out that apply
is often slower than a list comprehension (in the benchmark below, it's 20x slower). To use a list comprehension, you could modify EOQ
still further so that you access elements by its index. Then call the function in a loop over df
rows that are converted to lists:
def EOQ2(row, ck, ch):
Q = math.sqrt((2*row[0]*ck)/(ch*row[1]))
return Q
df['Q2a'] = [EOQ2(x, ck, ch) for x in df[['D','p']].to_numpy().tolist()]
(3) As it happens, if the goal is to call a function iteratively, map
is usually faster than a list comprehension. So you could convert df
into a list, map
the function to it; then unpack the result in a list:
df['Q2b'] = [*map(EOQ2, df[['D','p']].to_numpy().tolist(), [ck]*len(df), [ch]*len(df))]
(4) As @EdChum notes, it's always better to use vectorized methods if it's possible to do so, instead of applying a function row by row. Pandas offers vectorized methods that rival that of numpy's. In the case of EOQ
for example, instead of math.sqrt
, you could use pandas' pow
method (in the benchmark below, using pandas vectorized methods is ~20% faster than using numpy):
df['Q_pd'] = df['D'].mul(2*ck).div(ch*df['p']).pow(0.5)
Output:
D p Q Q_np Q1 Q2a Q2b Q_pd
0 10 20 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
1 20 30 5.773503 5.773503 5.773503 5.773503 5.773503 5.773503
2 30 10 12.247449 12.247449 12.247449 12.247449 12.247449 12.247449
Timings:
df = pd.DataFrame({"D": [10,20,30], "p": [20, 30, 10]})
df = pd.concat([df]*10000)
>>> %timeit df['Q'] = df.apply(lambda row: EOQ(row['D'], row['p'], ck, ch), axis=1)
623 ms ± 22.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df['Q1'] = df.apply(EOQ1, ck=ck, ch=ch, axis=1)
615 ms ± 39.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit df['Q2a'] = [EOQ2(x, ck, ch) for x in df[['D','p']].to_numpy().tolist()]
31.3 ms ± 479 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df['Q2b'] = [*map(EOQ2, df[['D','p']].to_numpy().tolist(), [ck]*len(df), [ch]*len(df))]
26.9 ms ± 306 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit df['Q_np'] = np.sqrt((2*df['D']*ck)/(ch*df['p']))
1.19 ms ± 53.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit df['Q_pd'] = df['D'].mul(2*ck).div(ch*df['p']).pow(0.5)
966 µs ± 27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)