1

Let's say you have a data set that looks like this:

          Vietnam    Gulf War     Iraq War
veteran1    1           0            0
veteran2    0           1            0
veteran3    0           0            1
veteran4    0           1            1  # <---- Note this row

You want to consolidate these columns without affecting other columns in the dataframe like so:

          Service  
veteran1    1                  
veteran2    2                     
veteran3    3                            
veteran4    2  # <---- Note this row

Where

  • 1 = Vietnam, 2 = Gulf War, 3 = Iraq War
  • If a veteran has served 2 or more it should pick only one (as is the case with veteran4 where it picked their left-most column)
  • there are many other columns in the dataframe, and they shouldn't be affected by any of this

Questions:

How would you do this in R?

(Note: if it's easier to do in some other free open source program, please feel free to share which program and how you would do it. This is a massive dataset: 3 million rows, the American Community Survey.)

Community
  • 1
  • 1
thanks_in_advance
  • 2,603
  • 6
  • 28
  • 44
  • 2
    Something like `data.frame(Service = max.col(df>0, "first"), row.names = row.names(df))`? – lukeA Oct 30 '16 at 18:46

2 Answers2

3

Looking at your data, it seems a simple issue of:

if Vietnam > 0, then use 1, otherwise if Gulf war > 0 then 2, otherwise if Iraq > 0 then 3, else 0

vietnam = c(1, 0, 0,0) 
gulfwar = c(0,1,0,1)
iraq = c(0,0,1,1)
df = data.frame(vietnam, gulfwar, iraq) 
df$service <- ifelse(df$vietnam > 0,1,ifelse(df$gulfwar>0,2,ifelse(df$iraq>0,3,0)))
df

Result:

       vietnam gulfwar iraq service
  1       1       0    0       1
  2       0       1    0       2
  3       0       0    1       3
  4       0       1    1       2
dmg
  • 4,231
  • 1
  • 18
  • 24
1

Maybe a little bit complicated (respect to other solutions), but here is one approach using apply:

df$service <-  apply(df, 1, function(x) which(x == 1)[1] )
df
  vietnam gulfwar iraq service
1       1       0    0       1
2       0       1    0       2
3       0       0    1       3
4       0       1    1       2
SabDeM
  • 7,050
  • 2
  • 25
  • 38