2

The application here is grouping U.S. states into regions.

group1 <- c("ME", "NH", "VT", "MA", "CT", "RI")
group2 <- c("FL", "GA", "AL", "MS", "LA")

My data looks like:

SomeVar | State
---------------
300     | AL
331     | GA
103     | MA
500     | FL

And I would like to add a "region" column to the data according to the groupings above, like so:

SomeVar | State | Region
------------------------
300     | AL    | 2
331     | GA    | 2
103     | MA    | 1
500     | FL    | 2

Is there a straightforward way to assign factors based on groupings?

abeboparebop
  • 7,396
  • 6
  • 37
  • 46

3 Answers3

3
group1 <- c("ME", "NH", "VT", "MA", "CT", "RI")
group2 <- c("FL", "GA", "AL", "MS", "LA")

grouptab <- rbind(data.frame(State=group1,grp=1),
                  data.frame(State=group2,grp=2))
DF <- read.table(text="SomeVar  State
300      AL
331      GA
103      MA
500      FL",header=TRUE)

merge(DF,grouptab)

Or more generally:

groupList <- list(group1,group2)
grouptab <- data.frame(State=unlist(groupList),
                       grp=rep(seq_along(groupList),
                               sapply(groupList,length)))

(there may be other ways to do this -- I tried mapply but couldn't figure it out quickly)

I think suitable arguments to merge (e.g. all, all.x, all.y) would handle the missing-group cases in various ways.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
1
group1 <- c("ME", "NH", "VT", "MA", "CT", "RI")
group2 <- c("FL", "GA", "AL", "MS", "LA")

DF <- read.table(text="SomeVar  State
300      AL
331      GA
103      MA
500      FL",header=TRUE)

DF$Region <- NA
DF$Region[DF$State %in% group1] <- 1
DF$Region[DF$State %in% group2] <- 2

#   SomeVar State Region
# 1     300    AL     2
# 2     331    GA     2
# 3     103    MA     1
# 4     500    FL     2
Roland
  • 127,288
  • 10
  • 191
  • 288
1

Assuming your data frame is called df and that all the states are either in group 1 or in group 2 you can do

df$region <- ifelse(df$state %in% group1, 1, 2)
nico
  • 50,859
  • 17
  • 87
  • 112