A convoluted question and I'm not sure I'm expressing it as concisely as I could, but...
I'm in a position where I want to fit multivariate generalised linear models - and because of the size and complexity of my models I'm having to use rxGlm()
from the RevoScaleR
package rather than the built in glm()
function.
It's important that each factor in the model has a reference level of my choosing, which I can set using relevel()
of course. However the nuisance here is that the factor levels are reordered, so it makes the GLM model output confusing to work with. I'd like to be able to retrieve the original factor level ordering after I've fitted the model, for presentation purposes.
A simple example:
library(RevoScaleR) # from Microsoft R Client
x <- data.frame(country = c("Australia", "Belgium", "Chile", "Belgium", "Belgium"),
degree = c("Y", "Y", "N", "Y", "N"),
salary = c(10000, 15000, 5000, 20000, 4000))
model <- rxGlm(salary ~ country + degree, data = x, dropFirst = TRUE)
model$coefficients
This gives
(Intercept) country=Australia country=Belgium country=Chile degree=N degree=Y
-3500 NA 7500 8500 NA 13500
Both factors are ordered alphabetically here so the reference level is country = Australia
, degree = N
. Suppose I'd like to have my reference levels as country = Belgium
, degree = Y
. I can do this and then rerun the model:
x$country <- relevel(x$country, ref = "Belgium")
x$degree <- relevel(x$degree, ref = "Y")
model <- rxGlm(salary ~ country + degree, data = x, dropFirst = TRUE)
model$coefficients
This now gives the same model, but presented differently:
(Intercept) country=Belgium country=Australia country=Chile degree=Y degree=N
17500 NA -7500 1000 NA -13500
These are the coefficients I want, but now the ordering is wrong. Is there a simple way to rearrange this item using the factor ordering I had before the relevel()
commands?
Thank you.