I have this DataFrame:
print(TempvsDType)
CurrentThermostatTemp
DwellingType
Bungalow 0.0
Bungalow 22.0
Bungalow 22.0
Bungalow 25.0
Bungalow 18.0
Bungalow 21.0
Bungalow 22.0
Bungalow 10.0
Bungalow 18.0
Bungalow 20.0
Bungalow 20.0
Bungalow 22.0
Bungalow 20.0
Bungalow 10.0
Bungalow 30.0
Bungalow 22.0
Bungalow 20.0
Bungalow 20.0
Bungalow 19.0
Bungalow 20.0
Bungalow 22.0
Bungalow 20.0
Bungalow 21.0
Bungalow 22.0
Bungalow 15.0
Bungalow 22.0
Bungalow 0.0
Bungalow 24.0
Bungalow 30.0
Bungalow 20.0
... ...
Park Home 20.0
Park Home 23.0
Park Home 20.0
Park Home 20.0
Park Home 20.0
Park Home 18.0
Park Home 20.0
Park Home 15.0
Park Home 12.0
Park Home 20.0
Park Home 20.0
Park Home 23.0
Park Home 21.0
Park Home 20.0
Park Home 20.0
Park Home 20.0
Park Home 23.0
Park Home 18.0
Park Home 20.0
Park Home 18.0
Park Home 16.0
Park Home 17.0
Park Home 20.0
Park Home 20.0
Park Home 18.0
Park Home 18.0
Park Home 20.0
Park Home 20.0
Park Home 15.0
Park Home 21.0
[6247 rows x 1 columns]
I have separated each variable with the .truncate() method:
Flat = TempvsDType.truncate(before="Flat",after="Flat")
House = TempvsDType.truncate(before="House",after="House")
Bungalow = TempvsDType.truncate(before="Bungalow",after="Bungalow")
Maisonette = TempvsDType.truncate(before="Maisonette",after="Maisonette")
ParkHome = TempvsDType.truncate(before="Park Home",after="Park Home")
My goal here is to perform a student t-test for all possible combinations between the variables, except for duplicates or repeated pairs. However, I had to this manually which was very long and time consuming, especially for other scripts where there are more than 5 variables and number of combinations increases substantially . This was my manual method:
from scipy.stats import ttest_ind
#All possible combinations:
Flat_House = ttest_ind(Flat,House)
Flat_Bungalow = ttest_ind(Flat,Bungalow)
Flat_Maisonette = ttest_ind(Flat,Maisonette)
Flat_ParkHome = ttest_ind(Flat,ParkHome)
House_Bungalow = ttest_ind(House,Bungalow)
House_Maisonette = ttest_ind(House,Maisonette)
House_ParkHome = ttest_ind(House,ParkHome)
Bungalow_Maisonette = ttest_ind(Bungalow,Maisonette)
Bungalow_ParkHome = ttest_ind(Bungalow,ParkHome)
Maisonette_ParkHome = ttest_ind(Maisonette, ParkHome)
#t-test between each combination
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[1],Flat_House[0],Flat_House[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[2],Flat_Bungalow[0],Flat_Bungalow[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[3],Flat_Maisonette[0],Flat_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[0],u[4],Flat_ParkHome[0],Flat_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[2],House_Bungalow[0],House_Bungalow[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[3],House_Maisonette[0],House_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[1],u[4],House_ParkHome[0],House_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[2],u[3],Bungalow_Maisonette[0],Bungalow_Maisonette[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[2],u[4],Bungalow_ParkHome[0],Bungalow_ParkHome[1]))
print("t-test between {} and {} is {} and p-value:{}".format(u[3],u[4],Maisonette_ParkHome[0],Maisonette_ParkHome[1]))
Therefore, I would like to know how can I write a function that would do this automatically, i.e. print student t-test for all possible combinations except duplicates and existing pairs and return it the way I have printed it manually. I have tried this many times but have not succeeded.I would be very pleased if someone could help me. Thank you.