1

I have this table:

Profession Educational_level Number
Doctor     Low               0
Doctor     Medium            5
Doctor     High              8
Nurse      Low               1
Nurse      Medium            8
Nurse      High              3
[...]

I want to find out the educational level of the median person and end up with this table:

Doctor     High
Nurse      Medium
[...]

Questions:

  1. How do I do this in R?
  2. And in Excel?

Thanks.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
dani
  • 4,880
  • 8
  • 55
  • 95

3 Answers3

2

I recommend @PaulHiemstra's answer if relying on additional packages isn't a problem for your workflow. Otherwise, this is probably the easiest way to do it in base R:

df <- read.csv(text="Profession Educational_level Number
Doctor     Low               0
Doctor     Medium            5
Doctor     High              8
Nurse      Low               1
Nurse      Medium            8
Nurse      High              3", header=TRUE)

results <- by(df, INDICES=dat$Profession, 
   FUN=function(subset) with(subset, Educational_level[which.max(Number)]))
data.frame(names(results), unclass(results))
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
1

I don't think you are looking for the median educational level per profession, but the mode, i.e. the category with the highest frequency. To get this you can use ddply from the plyr package:

require(plyr)
ddply(df, .(Profession), summarise, 
          mode_educ = Educational_level[which.max(Number)]

where df is the data.frame which contains your data.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Beat me to it again! Just a couple notes: this does assume that the titles in the original table are in fact "names" in the R sense. Otherwise, you'd have to refer to column numbers, like `doctor_mode <- mytable[which.max(mytable[mytable[,1]=='Doctor',3]),2]` (I probably fouled up the inner references there) – Carl Witthoft Sep 26 '12 at 15:18
  • If there are no names you can quite easily add them, which is a good idea in general when working with data.frames. – Paul Hiemstra Sep 26 '12 at 15:25
  • The OP might really mean median ... although in the example they gave, median=mode. What if the Doctor data were {Low=7,Medium=5,High=8} ... ? – Ben Bolker Sep 26 '12 at 16:28
  • if the levels of the factor are in the correct order, then `levels(f)[round(median(as.numeric(f)))]` seems to be the right incantation ... ? – Ben Bolker Sep 26 '12 at 16:31
1

In Excel you could use an "array formula" like this

=INDEX(B2:B10,MATCH(1,(A2:A10="Doctor")*(C2:C10=MEDIAN(IF(A2:A10="Doctor",C2:C10))),0))

confirmed with CTRL+SHIFT+ENTER

That works for your examples but for a genuine median, if there were 4 doctors, for instance, then the median value for doctors is halfway between the 2nd and 3rd, therefore possibly not any value in the list, so how will you know which educational level to assign?

barry houdini
  • 45,615
  • 8
  • 63
  • 81