How do you apply hypothesis testing to your features in a ML model? Let say for example that I am doing a regression task and I want to cut some features (once I have trained my model) to increase performance. How do I apply hypothesis testing to decide whether that feature is useful or not? I am just a bit confused about what my null hypothesis would be, level of significance and how to run the experimentation to get the p-value of the feature (I have heard that a level of significance of 0.15 is a good threshold, but I am not sure).
For example. I am doing a regression task to predict the cost of my factory, considering the production of three machines (A,B,C). I make a linear regression with the data and I find out that the p-values of machine A is greater than my level of significance, hence, it is not statistically significant and I decide to discard that feature for my model.
I have taken this example from a video on Youtube. I put the link below.
The relevant bit start from min 4:00 to 7:00 https://www.youtube.com/watch?v=HgfHefwK7VQ
I have tried reading about it, but I haven't been able to understand how he decided that level of significance and how he applied hypothesis testing in this case.
The data looks something like this
d = ('Cost': [44439, 43936, 44464, 41533, 46343],
'A': [515, 929, 800, 979, 1165],
'B': [541, 710, 675, 1147, 939],
'C': [928, 711, 824, 758, 635, 901])
df = pd.DataFrame(data=d)
After the model has been fit, the weights are as follow:
Bias weight: 35102, Machine A: 2.066, Machine B: 4.17, Machine C: 4.79
Now, the issue is that the p-value for Machine A = 0.23, which was considered too high and therefore, this feature was excluded from the predictive model