I have a basic question. I am running binomial GLMs, with numeric predictors. Some of these predictors have very few unique values - some have 2, some 3 and some have 4. All these predictors are on a clear and interpretable continuous scale - I just sampled a lot of times from very few places on the scale (I know, not ideal for regression, but cannot be changed). Take for example the following table. Imagine this table is repeated like this for 10'000 more times, with just the response values varying:
response | pred1 | pred2 | pred3 |
---|---|---|---|
0 | 20 | 100 | 100 |
1 | 50 | 900 | 200 |
1 | 20 | 4000 | 800 |
0 | 50 | 100 | 900 |
1 | 20 | 900 | 100 |
0 | 50 | 4000 | 100 |
1 | 20 | 100 | 800 |
0 | 50 | 900 | 900 |
My question is: (when) does it make sense to translate these predictors into factors? If a numeric variable only contains 2 unique values, does it even make a difference if it's a factor or numeric? Can I trust estimates based on just 3 or 4 unique values? Would it be better to make it a factor and thereby "acknowledge" that we cannot infer a linear regression line from the few values we have sampled?
I assume, since they can all be placed on a continuous scale, it makes sense to keep them numeric, but I just wanted to make sure I'm doing the right thing.