In R, the formula object is symbolic and it seems rather hard to parse. However, I need to parse such a formula into an explicit set of labels for use outside of R.
(1)
Letting f
represent the model formulae in which a response is not specified, e.g. ~V1 + V2 + V3
, one thing I tried was:
t <- terms(f)
attr(t, "term.labels")
However, this doesn't get what is exactly explicit if some of the variables in f
are categorical. For example, let V1
be a categorical variable with 2 categories, i.e. a boolean, and let V2
be a double.
Therefore, a model that is specified by ~V1:V2
should have 2 parameters: "intercept" and "xyes:z". Meanwhile, a model that is specified by ~V1:V2 - 1
should have parameters "xno:z" and "xyes:z". However, without a way of telling the function terms()
which variables are categorical (and how many categories) is has no way of being able to interpret these. Instead, it just has V1:V2
in its "terms.labels" which doesn't mean anything in the context that V1
is categorical.
(2)
On the other hand, using model.matrix
is an easy way to get exactly what I want. The problem is that it requires a data
argument, which is bad for me because I only want an explicit interpretation of the symbolic formula for use outside of R. This method of getting that will waste a lot time (comparatively) because R has to read the data from an outside source when all it really needs to know for the formula is which variables are categorical (and how many categories) and which variables are doubles.
Is there any way to use 'model.matrix' with only specifying the types of data, rather than the actual data? If not, what else is a viable solution?