0

I have a data frame which consist of many cities and their corresponding temperature:

               CurrentThermostatTemp
City                                
Cradley Heath                   20.0
Cradley Heath                   20.0
Cradley Heath                   18.0
Cradley Heath                   15.0
Cradley Heath                   19.0
...                              ...
Walsall                         16.0
Walsall                         22.0
Walsall                         20.0
Walsall                         20.0
Walsall                         20.0

[6249 rows x 1 columns]

The unique values are:

Index(['Cradley Heath', 'ROWLEY REGIS', 'Smethwick', 'Oldbury',
       'West Bromwich', 'Bradford', 'Bournemouth', 'Poole', 'Wareham',
       'Wimborne',
       ...
       'St. Helens', 'Altrincham', 'Runcorn', 'Widnes', 'St Helens',
       'Wakefield', 'Castleford', 'Pontefract', 'Walsall', 'Wednesbury'],
      dtype='object', name='City', length=137)

My aim is to do the one-way ANOVA test i.e.

from scipy.stats import f_oneway

for all unique values in the data frame. So do

SciPy.stats.f_oneway("all unique values")

And receive the output: One-way ANOVA test for all variables gives {} with p-value {} This is what I have tried many times but does not work:

all = Tempvs.index.unique()
Tempvs.sort_index(inplace=True)
for n in range(len(all)):
    truncated = Tempvs.truncate(all[n], all[n])
    print(f_oneway(truncated))

1 Answers1

0

IIUC you want an ANOVA test where each sample contains the values Temp of unique elements City. If this is the case, you can do

import numpy as np
import pandas as pd
import scipy.stats as sps

# I create a sample dataset
index = ['Cradley Heath', 'ROWLEY REGIS',
         'Smethwick', 'Oldbury',
         'West Bromwich', 'Bradford', 
         'Bournemouth', 'Poole', 'Wareham',
         'Wimborne','St. Helens', 'Altrincham', 
         'Runcorn', 'Widnes', 'St Helens',
         'Wakefield', 'Castleford', 'Pontefract', 
         'Walsall', 'Wednesbury']
np.random.seed(1)
df = pd.DataFrame({
    'City': np.random.choice(index, 500),
    'Temp': np.random.uniform(15, 25, 500)
})

# populate a list with all
# values of unique Cities
values = []
for city in df.City.unique():
    _df = df[df.City==city]
    values.append(_df.Temp.values)

# compute the ANOVA
# with starred *list
# as arguments
sps.f_oneway(*values)

that, in this case, will give

F_onewayResult(statistic=0.4513685152123563, pvalue=0.9788508507035195)

PS: do not use all as a variable, because it is a builtin python function, see https://docs.python.org/3/library/functions.html#all

Max Pierini
  • 2,027
  • 11
  • 17