0

The following is a data frame:

     Sex year
     M   2.2
     M   2.3
     F   2.7
     F   3.1
     M   4.1
     F   3.6

i have to compute a new variable category with

year>3.2=category_a

2.5 < year<3.2=category_b

year<2.5=category_c

MY ATTEMPT:

 age <- read.table("data.txt",header=TRUE)
 category <- c(1,1,1,1,1,1)
 for(i in 1:6){
     if(subset(age,year[i]<3.2)){
     category[i]="category_a"
   } else if (subset(age,2.5<year[i]<3.2)){
     category[i]="category_b"
   } else (subset(age,year[i]<2.5)){
     category[i]="category_c"
   } 
  } # end for loop 

But this is not working

joran
  • 169,992
  • 32
  • 429
  • 468
ABC
  • 341
  • 3
  • 10
  • 3
    Have you looked at `cut` yet? – A5C1D2H2I1M1N2O1R2T1 Jun 28 '13 at 14:26
  • Hint: look at what `subset(age, year[1] < 3.2)` returns. – Drew Steen Jun 28 '13 at 14:30
  • Besides `cut` there is also the option of `findInterval` used as an index into a character vector. There must now be scores of these cut/findInterval examples in SO so I think the right answer would have been to suggest more efforts at searching. – IRTFM Jun 28 '13 at 14:37
  • I want to compute it by if...else statement so that i may be more clear of the loop. Is there a way? I have got an answer by 'cut'. it is working. – ABC Jun 28 '13 at 14:43
  • I have several portions of the question for which i need to be more clear of chain if...else statement and also for loop. Could you please give me some reference of this control-stuctures? – ABC Jun 28 '13 at 14:54

2 Answers2

2

Based on @Ananda's suggestion:

cut(age$year, c(-Inf,2.5,3.2,Inf), labels=c("category_c","category_b","category_a"))
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • How "category_b" ranges from 2.5 to 3.2 ? I am asking this because 2nd argument is a vector of length 4 but 3rd argument is a vector of length 3. – ABC Jun 28 '13 at 14:40
  • The second argument reflects the cut points in the variable. Take a look at the help files (`? cut`). – Thomas Jun 28 '13 at 14:43
  • There's also another question on this: http://stackoverflow.com/questions/13061738/cut-and-labels-breaks-length-conflict/13061832#13061832 – Thomas Jun 28 '13 at 14:44
1

As requested by OP:

Solution with ifelse (not tested)

Assuming your data is sample:

data$age <- with(data, ifelse(year<2.5,"category_c",
                        ifelse((year>2.5 & year<3.2),"category_b",
                         ifelse(year>3.2,"category_a",NA)), NA))

Suggestion: Please do not use ifelse if you have large categories. Instead use cut as answered by @Thomas.

Note: NA is assigned for the year with values of 2.5 and 3.2 since you assume the open interval.

Thomas
  • 43,637
  • 12
  • 109
  • 140
Metrics
  • 15,172
  • 7
  • 54
  • 83