-1

I know it sounds ridiculous but I have to pass a for loop into a function. I have a dataframe with 75+ columns and most of them are categorical variables. One of the variable is called SalePrice and i wish to find the correlation between the categorical variables and SalePrice.

This is my code, but i think it is ridiculous to go through all 75 columns manually. Is there a easy way?

df = pd.read_csv(file, delimiter=',')
qualityTest = df[["OverallQual","SalePrice"]]
qualities = [1,2,3,4,5,6,7,8,9,10]
stats.f_oneway(qualityTest['SalePrice'][qualityTest['OverallQual'] == 1],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 2],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 3],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 4],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 5],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 6],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 7],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 8],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 9],
              qualityTest['SalePrice'][qualityTest['OverallQual'] == 10])

I've tried doing this but it doesnt work

stats.f_oneway(
    for i in qualities:
        qualityTest['SalePrice'][qualityTest['OverallQual'] == i]
)
Aaron
  • 418
  • 1
  • 4
  • 16

2 Answers2

5

You can use a list comprehension - essentially, create a list using a for loop, and pass that in:

stats.f_oneway([qualityTest['salePrice'][qualityTest['OverallQual'] == i] for i in qualities])

Or if you want it passed as i separate arguments instead of as one list with i elements, you can add an * right in front of the outermost set of square brackets (which will unpack the list you just made into function arguments).

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • Hi, I've tried both ways and it returned me with an error. ValueError: setting an array element with a sequence. (For list) TypeError: float() argument must be a string or a number, not 'generator' (After removing the outer most brackets) – Aaron Jul 01 '19 at 03:44
  • 3
    Removing the outermost square brackets will pass a generator. If you want it the arguments to separate, you must use the `*` unpacking operator. Something like: `stats.f_oneway(*(qualityTest['salePrice'][qualityTest['OverallQual'] == i] for i in qualities))` – iz_ Jul 01 '19 at 03:44
  • @Tomothy32 Thanks for that, I've updated my answer. Incidentally, why *doesn't* a generator work here, when it works for things like `print()` and the other stdlib functions that take an arbitrary number of arguments? – Green Cloak Guy Jul 01 '19 at 13:09
  • 1
    Could you give an example? Trying `print(i for i in range(5))` yields ` at *memory address*>`. Maybe you meant unpacking an generator, like `print(*(i for i in range(5)))`. – iz_ Jul 02 '19 at 15:54
3

Using groupby here

qualityTest.groupby('OverallQual').OverallQual.apply(stats.f_oneway)
BENY
  • 317,841
  • 20
  • 164
  • 234