0

I am only in an introductory R class, so this is probably quite basic.

I am using the Outlook on Life dataset and am interested in Income. Respondents had to choose one of the following 19 choices:

Less than $5,000     
$5,000 to $7,499     
$7,500 to $9,999     
$10,000 to $12,499   
$12,500 to $14,999   
$15,000 to $19,999   
$20,000to $24,999    
$25,000 to $29,999   
$30,000 to $34,999  
$35,000 to $39,999   
$40,000 to $49,999   
$50,000 to $59,999   
$60,000 to $74,999   
$75,000 to $84,999   
$85,000 to $99,999   
$100,000 to $124,999
$125,000 to $149,999 
$150,000 to $174,999
$175,000 or more 

I want to collapse and simplify this to the following just to make plots more intelligible:

  1. Under poverty line ($0 - 24,999),
  2. Working class ($25,000 - 34,999),
  3. Lower middle class ($35,000 - 60,000),
  4. Middle class ($60,000 - 100,000),
  5. Upper middle class ($100,000 - 150,000),
  6. Top 5 percent ($150,000 +).

How would I go about recoding this?

Thank you!

dan1st
  • 12,568
  • 8
  • 34
  • 67
Katherine
  • 51
  • 1
  • 3
  • 2
    try the cut function – Chris Sep 26 '15 at 01:51
  • 4
    Your intervals are problematic. If someone made 22,000 they would pick group 7 (20k - 24,999). You would want them in Under Poverty Line. But someone making 24k would also choose group 7.But they are in Working class. How would you know the difference? – Pierre L Sep 26 '15 at 02:27
  • Yes, it's problematic. I could massage my desired groupings so they fit better with the pre-established intervals. So I could make Under Poverty Line go up to 24,999. And then working class 34,999. – Katherine Sep 26 '15 at 02:55
  • @Katherine: Edit your code/question so it poses a problem that has a sensible answer. Comments are NOT the proper way to amend a question. – IRTFM Sep 26 '15 at 05:49

1 Answers1

2

The easiest way to re-encode factors is to realise that the levels function can accept a list of values which can be used to remap your factor levels.

I have assumed that your data is already a factor (as you have said "Respondents had to choose one of the following 19 choices") which means it doesn't really make sense to use the cut function.

Here is a simple example of it in action:

z <- gl(3, 2, 12) # [1] 1 1 2 2 3 3 1 1 2 2 3 3, Levels: 1 2 3
levels(z) <- list(A = c(1,3), B = 2)
z #  [1] A A B B A A A A B B A A, Levels: A B

As you can see from the example above, we have re-encoded the levels 1 and 3 to be group A and level 2 to be group B. So your question can be completed in a similar way:

groups <- as.factor(sample(c("Less than $5,000",
"$5,000 to $7,499",
"$7,500 to $9,999",
"$10,000 to $12,499",
"$12,500 to $14,999",
"$15,000 to $19,999",
"$20,000to $24,999",
"$25,000 to $29,999",
"$30,000 to $34,999",
"$35,000 to $39,999",
"$40,000 to $49,999",
"$50,000 to $59,999",
"$60,000 to $74,999",
"$75,000 to $84,999",
"$85,000 to $99,999",
"$100,000 to $124,999",
"$125,000 to $149,999",
"$150,000 to $174,999",
"$175,000 or more"), size=100, replace=T))

levels(groups) <- list(
  "Under poverty line"=c("Less than $5,000",
        "$5,000 to $7,499",
        "$7,500 to $9,999",
        "$10,000 to $12,499",
        "$12,500 to $14,999",
        "$15,000 to $19,999",
        "$20,000to $24,999"),
  "Working class"=c("$25,000 to $29,999",
                    "$30,000 to $34,999"),
  "Lower middle class"=c("$35,000 to $39,999",
                         "$40,000 to $49,999",
                         "$50,000 to $59,999"), 
  "Middle class"=c("$60,000 to $74,999",
                   "$75,000 to $84,999",
                   "$85,000 to $99,999"),
  "Upper middle class"=c("$100,000 to $124,999",
                         "$125,000 to $149,999"),
  "Top 5 percent"=c("$150,000 to $174,999",
                    "$175,000 or more")
  )
chappers
  • 2,415
  • 14
  • 16