11

Suppose I want to write a function in R which is a function of a couple of sufficient statistics on some data. For example, suppose the function, call it foo.func depends only on the sample mean of a sample of data. For convenience, I think users might like to pass to foo.func the sample of random variables (in which case foo.func computes the sample mean), or the sample mean itself, which is all that foo.func needs. For reasons of efficiency, the latter is preferred if there are multiple functions like foo.func being called which can take the sample mean. In that case the mean need only be computed once (in the real problem I have, the sample statistics in question might be computationally intensive).

In summary, I would like to write foo.func to be accessible to the beginner (pass in the data, let the function compute the sufficient statistics) as well as the expert (precompute the sufficient statistics for efficiency and pass them in). What are the recommended practices for this? Do I have a logical flag passed in? Multiple arguments? Some ways to do it might be:

#optional arguments
foo.func <- function(xdata, suff.stats=NULL) {
  if (is.null(suff.stats)) {
    suff.stats <- compute.suff.stats(x)
  }
  #now operate on suff.stats
}

or

#flag input
foo.func <- function(data.or.stat, gave.data=TRUE) {
  if (gave.data) {
    data.or.stat <- compute.suff.stats(data.or.stat)
  }
  #now operate on data.or.stat
}

I am leaning towards the former, I think

shabbychef
  • 1,940
  • 3
  • 16
  • 28

2 Answers2

10

The R way of implementing polymorphism is through a CLOS (Common Lisp's OO) model where methods are associated with generic functions (verbs) rather than classes (nouns). For instance,

# suprising that there is not an equivalent function in R
# to propagate inheritance...
addclass <- function(x,classname) structure(x,class=append(class(x),classname))

# this should be your main function that does stuff
# here, the identity function is assigned for example
dostuff <- identity

# define generic function and methods
foo <- function(x,...) UseMethod("foo")
foo.raw <- function(x,...) dostuff(mean(x))
foo.stats <- function(x,...) dostuff(x)

# define two types of inputs
x <- 1:10
x <- addclass(x,"raw")

y <- 5
y <- addclass(y,"stats")

# apply
foo(x)
# [1] 5.5
foo(y)
# [1] 5
# attr(,"class")
# [1] "numeric" "stats"  

The example was using R's S3 OOP model, which I think are quite sufficient; S4 is more modern and safe but adds a lot of boilerplate.

symbolrush
  • 7,123
  • 1
  • 39
  • 67
hatmatrix
  • 42,883
  • 45
  • 137
  • 231
  • 1
    I suppose you could also define `foo` such that you test to see if it is likely that it is a statistic -- e.g., `if(length(x) > 1) x <- dostats(x)` and so on. I believe this is done quite often in `R`'s functions where the nature of the input argument is evaluated and the appropriate action taken, without the user specifying additional arguments. For instance, see `boxplot.default`, where the initial statements are trying to ascertain the nature of its first argument. – hatmatrix Oct 28 '11 at 19:06
  • Glad you liked it -- I should add that only the first argument determines the method dispatched in the `S3` OO model, whereas you can have multimethods in R's `S4` classes. – hatmatrix Oct 28 '11 at 23:16
  • Yep, this is quite enlightening. There's a lot of interesting aggregations of functions here. – Iterator Oct 29 '11 at 03:55
  • This is pretty helpful for the rest of my package, but I'm still getting my toes wet in `R`. – shabbychef Oct 31 '11 at 17:06
9

You can also embed functions into the arguments, as:

foo.func <- function(x, suff.stats = foo.func.suff.stat(x)){
  # your code here
}

As an example:

foo.func <- function(x, avg = mean(x)){
  return(avg)
}

foo.func(1:20)
foo.func(avg = 42)

Alternatively, you can either use a default setting of NULL for various arguments, and test for is.null(argument), or simply check the value of missing(argument) for each for each argument you might calculate.


Update 1: I erred in suggesting use of a default value of NA: it is far more appropriate to use NULL. Using NA and is.na() will behave oddly for vector inputs, whereas NULL is just a single object - one cannot create a vector of NULL values, so is.null(argument) behaves as expected. Apologies for the forgetfulness.

Iterator
  • 20,250
  • 12
  • 75
  • 111
  • @crippledlambda FYI: I had a mistake, but it's fixed. – Iterator Nov 02 '11 at 20:43
  • Thanks, I didn't look at it carefully but was commending the general idea. Yes, I've found that many consider the `NA`/`NULL` distinction as an R gotcha... – hatmatrix Nov 03 '11 at 00:14