I'm trying to learn to write R code that I can re-use without expecting problems in the future, specifically due to names that I assign data to in my function conflicting with names in the data passed into the function. I don't see any best practices for handling this kind of thing written down anywhere. I'm looking for suggestions on how to improve what I'm doing (or validation that what I'm doing is a best practice, but that seems unlikely).
I'm using my get_name() to get a name that is not used in the data; then I'm using assign() to assign results to that name so I can use it in the updated formula; and then I have to do it again and use get() with the weights argument. All to avoid the possibility that the incoming data/formula may already contain the variables names I would've used.
The code:
fgls_harvey = function(frml, data) {
reg = lm(frml, data)
en = get_name('_lresid2_', 'e', data)
assign(en, log(residuals(reg)^2))
f = update.formula(frml, reformulate('. + 0', en))
environment(f) = environment()
reg2 = lm(f, data)
exp_n = get_name('exppv', 'e', data)
assign(exp_n, exp(fitted(reg2)) / sum(fitted(reg2)))
environment(frml) = environment()
reg_fgls = lm(frml, data, weights=get(exp_n))
}
get_name = function(base, suffix, df) {
if ('data.frame' %in% class(df)) { # either a d.f-like object
names = colnames(df)
} else { # or an lm-like object
names = colnames(df$model)
}
if (base %in% names) {
get_name(sprintf('%s%s', base, suffix), suffix, df)
} else {
base
}
}