1

Let's say I have the following simple situation:

import pandas as pd

def multiply(row):
    global results
    results.append(row[0] * row[1])

def main():
    results = []
    df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}])
    df.apply(multiply, axis=1)
    print(results)

if __name__ == '__main__':
    main()

This results in the following traceback:

Traceback (most recent call last):

  File "<ipython-input-2-58ca95c5b364>", line 1, in <module>
    main()

  File "<ipython-input-1-9bb1bda9e141>", line 11, in main
    df.apply(multiply, axis=1)

  File "C:\Users\bbritten\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
    ignore_failures=ignore_failures)

  File "C:\Users\bbritten\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
    results[i] = func(v)

  File "<ipython-input-1-9bb1bda9e141>", line 5, in multiply
    results.append(row[0] * row[1])

NameError: ("name 'results' is not defined", 'occurred at index 0')

I know that I can move results = [] to the if statement to get this example to work, but is there a way to keep the structure I have now and make it work?

cs95
  • 379,657
  • 97
  • 704
  • 746
tblznbits
  • 6,602
  • 6
  • 36
  • 66

2 Answers2

4

You must declare results outside the functions like:

import pandas as pd

results = []

def multiply(row):
    # the rest of your code...

UPDATE

Also note that list in python is mutable, hence you don't need to specify it with global in the beginning of the functions. Example

def multiply(row):
    # global results -> This is not necessary!
    results.append(row[0] * row[1])
Carlos Afonso
  • 1,927
  • 1
  • 12
  • 22
  • So the variables must be truly global in nature, and cannot be defined in a local scope, such as what I was doing. Is that what you're saying? I'm likely not going to use the right terminology here, but is there a way to access the environment of a calling function? So, for example, can I specify that `multiply` should use the environment of `main` for variables? – tblznbits Jul 26 '17 at 19:24
  • This still won't fix your problem. – cs95 Jul 26 '17 at 19:26
  • Yes. If you want a global variable you must declare it outside of the scope of any function, the `global` keyword just says that variable you're manipulating is actually the global one rather than a new one created locally inside this function. If you wan't to make a variable "environment specific" you should use **classes** then. – Carlos Afonso Jul 26 '17 at 19:29
  • Understood. Thanks for the help! – tblznbits Jul 26 '17 at 19:31
  • @cᴏʟᴅsᴘᴇᴇᴅ Technically this answer _does_ fix the problem. It fixes the TraceBack the OP posted in their question. However as you noted in your (now deleted) answer, if the OP wants to use `multiply` with `df.apply()`, they need to return an actual value. – Christian Dean Jul 26 '17 at 19:35
0

You must move results outside of the function. I don't think there is any other way without moving the variable out.

One way is to pass results as a parameter to multiply method.

Deepak Singh
  • 411
  • 1
  • 5
  • 13