0

I am trying to trim a vector in r by percentage. The function should discard the lowest 5% and the highest 5% values of X, and then return the mean, variance, and median of X.

my_aggregation <- function(x,is.truncated=FALSE){
  if (is.truncated==FALSE) return(data.frame(mean = mean(x), med = median(x), var = var(x)))  
  if (is.truncated==TRUE) Trim(x,0.05) return(data.frame(mean = mean(x), med = median(x), var = var(x)))  
  
}

I have tried to assign to a new variable like this

  if (is.truncated==TRUE) x2 <- Trim(x,0.05) return(data.frame(mean = mean(x2), med = median(x2), var = var(x2)))  

With no luck. The reason I dont find the problem is because I am new and also when I run this

   my_aggregation <- function(x,is.truncated=FALSE){  
  if (is.truncated==FALSE) return(data.frame(mean = mean(x), med = median(x), var = var(x)))  
  if (is.truncated==TRUE)return(data.frame(mean = mean(x), med = median(x), var = var(x)))  
  if (is.truncated=="BOTH") return({x*2})
}

Everything works perfectly fine. Maybe it has something to do with the Trim() function, but I dont know what is is exactly. Anyway, this is the error I am getting.

Error: unexpected symbol in:
"  if (is.truncated==FALSE) return(data.frame(mean = mean(x), med = median(x), var = var(x)))  
  if (is.truncated==TRUE) Trim(x,0.05) return"
>   if (is.truncated=="BOTH") return({x*2})
Error: object 'is.truncated' not found
> }
Error: unexpected '}' in "}"

If you have other suggestions of how I might achieve the same results, I am all ears.

Phil
  • 7,287
  • 3
  • 36
  • 66
Shlomi
  • 1
  • 2
  • You seem to have mixed up the syntax. Where is the function `Trim` defined, or are you trying to define that inside your function? – Allan Cameron Nov 17 '20 at 16:08

2 Answers2

1

It's not clear what Trim is in your question, nor the significance of is.truncated. My best guess is that you wish to get your summary statistics but control whether the user can trim off the bottom 5%, the top 5%, or both. If so, you could do:

my_aggregation <- function(x, truncate = c("none", "lower", "upper", "both")) 
{
  truncate <- match.arg(truncate, truncate)
  keep <- !logical(length(x))
  if(truncate == "lower" | truncate == "both") 
    keep[x < quantile(x, 0.05)] <- FALSE
  if(truncate == "upper" | truncate == "both")
    keep[x > quantile(x, 0.95)] <- FALSE
  x <- x[keep]
  return(data.frame(min = min(x), max = max(x), mean = mean(x), 
                    med = median(x), var = var(x)))
}

That allows you to do:

my_aggregation(1:100, "none")
#>   min max mean  med      var
#> 1   1 100 50.5 50.5 841.6667

my_aggregation(1:100, "lower")
#>   min max mean med var
#> 1   6 100   53  53 760

my_aggregation(1:100, "upper")
#>   min max mean med var
#> 1   1  95   48  48 760

my_aggregation(1:100, "both")
#>   min max mean  med   var
#> 1   6  95 50.5 50.5 682.5
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
0

I suppose what you wanted to do is

my_aggregation <- function(x, is.truncated=FALSE){
  
  require(DescTools)      

  if (is.truncated)   # a '== TRUE' chunk would be obsolete
    x <- Trim(x, trim = 0.05)
  
  return(c(mean=mean(x), median=median(x), var=var(x)))
}         

set.seed(34)
x <- runif(100)

my_aggregation(x)
#       mean     median        var 
# 0.51284962 0.50315252 0.08925482 

my_aggregation(x, is.truncated = TRUE)
#       mean     median        var 
# 0.51390241 0.50315252 0.07350477 

Note that you do not store the result of the trimming statement, if you code like

 if (is.truncated==TRUE) Trim(x,0.05) return(data.frame(mean = mean(x), med = median(x), var = var(x)))  

Using indentation would reveal the logic of your code, which is

 if (is.truncated==TRUE) 
   Trim(x,0.05)

 return(data.frame(mean = mean(x), med = median(x), var = var(x)))  

So you see that only Trim(x,0.05) is executed in condition of is.truncated. Use {} to combine multiple steps, e.g.

 if (is.truncated==TRUE) {
   x <- Trim(x,0.05)   # you have to redefine x here, if it should be used afterwards
   return(data.frame(mean = mean(x), med = median(x), var = var(x)))
 }  

Last one: Why returning a data.frame if you only have three numeric values as result? In general use the simplest possible data structure for your problem, which here is a numeric vector.

Andri Signorell
  • 1,279
  • 12
  • 23