9

From a runtime efficiency perspective in python are these equally efficient?

x = foo()
x = bar(x)

VS

x = bar(foo())

I have a more complex problem that can essentially be boiled down to this question: Obviously, from a code length perspective the second is more efficient, but is the runtime better as well? If they are not, why not?

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
Adrix
  • 401
  • 1
  • 3
  • 11

2 Answers2

5

Here's a comparison:

First case:

%%timeit
def foo():
    return "foo"

def bar(text):
    return text + "bar"

def test():
    x = foo()
    y = bar(x)
    return y

test()
#Output:
'foobar'
529 ns ± 114 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Second case:

%%timeit

def foo():
    return "foo"

def bar(text):
    return text + "bar"

def test():   
    x = bar(foo())
    return x

test()
#Output:
'foobar'
447 ns ± 34.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

But that is just the comparison running %%timeit once for each case. The following are times for 20 iterations(time in ns) for each case:

df = pd.DataFrame({'First Case(time in ns)': [623,828,634,668,715,659,703,687,614,623,697,634,686,822,671,894,752,742,721,742], 
               'Second Case(time in ns)': [901,786,686,670,677,683,685,638,628,670,695,657,698,707,726,796,868,703,609,852]})

df.plot(kind='density', figsize=(8,8))

enter image description here

It was observed, with each iteration, the differences were diminishing. This plot shows that the performance difference isn't significant. From a readability perspective, the second case looks better.

In the first case, two expressions are evaluated: the first expression assigns the return value from foo() to x first and then the second expression calls bar() on that value. This adds some overhead. In the second case only one expression is evaluated, calling both functions at once and returning the value.

amanb
  • 5,276
  • 3
  • 19
  • 38
  • Would a function body of: `return bar(foo())` be possible? – s_baldur Apr 24 '19 at 15:12
  • Yes, of course its possible, but that does not improve the speed as such. – amanb Apr 24 '19 at 15:14
  • While your answer is correct that inline is faster, it's *very* misleading. You timed *defining* `foo`, `bar` and `test`, then calling `test` once on every loop in the first case, and only defining `test` and calling it once in the second. Defining each function imposes overhead, but in a real world use case, you typically wouldn't plan on any function aside from `main` being called *exactly* once per definition. You want to time the cost to call only, not the cost to define the functions. If you do that, you'll find [the difference pretty trivial](https://stackoverflow.com/a/55834313/364696). – ShadowRanger Apr 24 '19 at 16:15
  • @ShadowRanger, thank you for pointing this. I've edited my answer with more performance results. The second case includes function definitions too, but I agree that in real world scenarios,`main` will be called exactly once for each definition and the comparison should be for the function call only. I observed with every test, the plots were getting closer and the performance difference diminishing. – amanb Apr 24 '19 at 19:32
  • For debugging the second case is worse. – Tjorriemorrie Apr 29 '19 at 00:17
2

It matters a tiny bit, but not meaningfully. amanb's test timed the definition of the functions in only one of the tests, and so had to do more work in the first test, skewing the results. Tested properly, the results differ only by the slimmest of margins. Using the same ipython %%timeit magic (IPython version 7.3.0, CPython version 3.7.2 for Linux x86-64), but removing the definition of the functions from the per-loop tests:

>>> def foo():
...     return "foo"
... def bar(text):
...     return text + "bar"
... def inline():
...     x = bar(foo())
...     return x
... def outofline():
...     x = foo()
...     x = bar(x)
...     return x
...

>>> %%timeit -r5 test = inline
... test()
...
...
332 ns ± 1.01 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)


>>> %%timeit -r5 test = outofline
... test()
...
...
341 ns ± 5.62 ns per loop (mean ± std. dev. of 5 runs, 1000000 loops each)

The inline code was faster, but the difference was under 10 ns/3%. Inlining further (to make the body just return bar(foo())) saves a tiny bit more, but again, it's pretty meaningless.

This is what you'd expect too; storing and loading function local names is about the cheapest thing the CPython interpreter can do, the only difference between the functions is that outofline requires an extra STORE_FAST and LOAD_FAST (one following the other), and those instructions are implemented internally as nothing but assignment to and reading from a compile-time determined slot in a C array, plus a single integer increment to adjust reference counts. You pay for the CPython interpreter overhead required by each byte code, but the cost of the actual work is trivial.

Point is: Don't worry about the speed, write whichever version of the code that would be more readable/maintainable. In this case, all the names are garbage, but if the output from foo can be given a useful name, then passed to bar whose output is given a different useful name, and without those names, the relationship between foo and bar is non-obvious, don't inline. If the relationship is obvious, and foo's output doesn't benefit from being named, inline it. Avoiding stores and loads from local variables is the most micro of microoptimizations; it won't be the cause of meaningful performance loss in almost any scenario, so don't base code design decisions on it.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • For those curious: I intentionally used the first line of `%%timeit` to alias each test function to a consistent local name, rather than just testing `%timeit -r5 inline()` & `%timeit -r5 outofline()`, because the first line of `%%timeit` defines (without timing) *local* variables for the test, then runs the subsequent block in that context. If you use the original names, you end up timing the cost of looking up `inline` and `outofline` in the *global* namespace, which isn't what you care about and, thanks to hash collisions, can unpredictably slow one option through no real fault of its own. – ShadowRanger Apr 24 '19 at 16:19