2

I have a date frame ("daten"), in which most columns are of numeric value. They typically range from 0 to 5. However, they can also take on the value 99. I want to calculate the mean of the columns, excluding only the values 99.

For example:

> mean(c(0, 1, 2, 3, 4, 5, 99))
[1] 16.28571

is not what I need, instead I want it to be calculated as if the vector was

> mean(c(0, 1, 2, 3, 4, 5))
[1] 2.5

, giving me the mean I am searching for.

There has been a similar question (Calculate mean, median by excluding any given number), but the solution does not work for me. I figured, however, that once I can exclude a certain value in any column, I can simply combine it with apply, so I am actually looking for a way to calculate a mean for a certain vector, but ignoring certain values.

Lukas
  • 424
  • 3
  • 6
  • 17
  • a wee side note; if the value `99` is a code to represent missing data it is probably worth while declaring this explicitly - it will make computations like this easier. One way you can do this is when reading in your data - see the `na.strings` argument of `read.table` etc.. – user20650 Oct 27 '17 at 12:00
  • I actually have missings as well, represented as NA; however, as it is a survey, I want to differentiate between missings (not having answered an item) and refusals (which was an extra option in any of the questions), thus being able to tell just how many respondents refused to answer. – Lukas Oct 27 '17 at 12:07
  • May be your example is not a representative one as we are getting the expected output. – akrun Oct 27 '17 at 12:08
  • @akrun, you're right - I didn't properly understand the syntax of the answer posted in the link. – Lukas Oct 27 '17 at 12:10

2 Answers2

6

We can replace the value '99' with NA and get the mean with na.rm = TRUE

mean(replace(v1, v1==99, NA), na.rm = TRUE)
#[1] 2.5

data

v1 <- c(0, 1, 2, 3, 4, 5, 99)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • This (aka replacing 99 by NA) is not what I wanted to do; however, I found my own (very stupid) mistake. Thanks! – Lukas Oct 27 '17 at 12:04
  • 1
    @Lukas In your description `I want to calculate the mean of the columns, excluding only the values 99.` – akrun Oct 27 '17 at 12:05
  • I'm sorry - my wording was misleading in this case. – Lukas Oct 27 '17 at 12:09
4

You can also try this:

vec1 <- c(0, 1, 2, 3, 4, 5, 99)
mean(vec1[which(vec1!=99)]
#[1] 2.5
tushaR
  • 3,083
  • 1
  • 20
  • 33
  • Thanks; my mistake actually lied in not properly replacing 'x'. Very stupid, thanks for still answering me! – Lukas Oct 27 '17 at 12:06