UCLA has this great site for statistical tests
but the code is all in R. I am trying to convert the code to Python equivalents but it is not a straightforward process for some like the chi-square goodness of fit. Here is the R version:
hsb2 <- within(read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv"), {
race <- as.factor(race)
schtyp <- as.factor(schtyp)
prog <- as.factor(prog)
})
chisq.test(table(hsb2$race), p = c(10, 10, 10, 70)/100)
My Python attempt is this:
import numpy as np
import pandas as pd
from scipy import stats
df = pd.read_csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
# convert to category
df["race"] = df["race"].astype("category")
t_race = pd.crosstab(df.race, columns = 'race')
p_tests = np.array((10, 10, 10, 70))
p_tests = ptests/100
# tried this
stats.chisquare(t_race, p_tests)
# and this
stats.chisquare(t_race.T, p_tests)
but neither stats.chisquare output comes close to the R version. Can anybody steer me in the right direction? TIA