2

A five columns table ("id", "othermood_v","rass_v", "gcs_v" and "cam_v") with around 52000 rows. There are three values in the last column ("cam_v"):0,1,2 as a class label. The "cam_v" column now has value as 1, 2 and NA. I would like to replace the NA value with either 0 or 1 based the other three columns "othermood_v", "rass_v" and "gcs_v". So if any of these three columns in the same row has a value like 1, then cam_v would be labeled as 1, otherwise 0. I tried to loop through data with a condition like

 if df$othermood_v>0|df$rass_v>0|df$gcs_v >0, then df$cam_v=1 else 
  0, rm NA = True

or

if (df$othermood_v+df$rass_v+df$gcs_v) >0, then cam_v=1 else 0

But I don't know how to get it to work. Any suggestions? BTW, the id is unique now. Thanks.

id  othermood_v rass_v  gcs_v   cam_v
100078  0   0   0   NA
100079  0   0   0   NA
100081  0   0   0   NA
100085  1   1   0   NA
100087  1   1   0   NA
100088  1   0   0   NA
100091  1   1   1   2
100094  0   1   0   NA
100095  1   0   0   NA
100096  0   0   0   NA
100098  1   1   1   2
100099  0   1   0   NA
100102  1   0   0   NA
100103  1   0   0   NA
100104  1   1   0   2
100106  0   0   0   NA
100108  1   0   0   NA
100109  1   0   0   NA
100112  1   0   0   NA
100113  1   1   1   1
100114  1   0   0   NA
100116  1   0   0   NA
100117  1   0   0   NA
100118  0   1   0   NA

table screenshot

lancet
  • 33
  • 4
  • Possible duplicate of [dplyr replacing na values in a column based on multiple conditions](https://stackoverflow.com/questions/50436248/dplyr-replacing-na-values-in-a-column-based-on-multiple-conditions) – A. Suliman Feb 13 '19 at 11:39

3 Answers3

1

We create a logical vector and then replace using another condition created with rowSums

i1 <- is.na(df1$cam_v) # logical index of NA elements in 'cam_v'
# assign the values 0 or 1 based on the occurrence of 1 in 
# either one of the columns from 2 to 4
df1$cam_v[i1] <- +(rowSums(df1[i1, 2:4] > 0) > 0)

data

df1 <- structure(list(id = c(100078L, 100079L, 100081L, 100085L, 100087L, 
100088L, 100091L, 100094L, 100095L, 100096L, 100098L, 100099L, 
100102L, 100103L, 100104L, 100106L, 100108L, 100109L, 100112L, 
100113L, 100114L, 100116L, 100117L, 100118L), othermood_v = c(0L, 
0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 0L), rass_v = c(0L, 0L, 0L, 1L, 1L, 0L, 
1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 1L), gcs_v = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), cam_v = c(NA, 
NA, NA, NA, NA, NA, 2L, NA, NA, NA, 2L, NA, NA, NA, 2L, NA, NA, 
NA, NA, 1L, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-24L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you for the super fast and smart solution! The only error I 've got is the first-row class label. The first-row label showed as integer(9). http://prntscr.com/mkif9g Don't know why. But I can manually correct it. :-). thanks! – lancet Feb 13 '19 at 11:44
  • @lancet. The first row comes out as `0` for me as all the other values in the row are 0 – akrun Feb 13 '19 at 11:46
  • 1
    I was thinking on something similar. `df[rowSums(df[,-4])>0 & is.na(df[,4]),4]=1`, `df[rowSums(df[,-4])==0 & is.na(df[,4]),4]=0` – boski Feb 13 '19 at 11:46
  • Ok, then I must run something wrong. But thanks for the reply – lancet Feb 13 '19 at 11:47
  • @lancet Let me paste the data I used. Can you check the `str(df1)`. In case it is of different type – akrun Feb 13 '19 at 11:47
  • @akrun http://prntscr.com/mkiitk BTW, the table name I used called t3, so I did str(t3) – lancet Feb 13 '19 at 11:51
  • @lancet Your `cam_v` is a `list` column. It should be `t3$cam_v <- unlist(t3$cam_v)` assuming the elements are of length 1. After that try the solution again – akrun Feb 13 '19 at 11:53
  • @akrun I 've got this error > t3$cam_v <- unlist(t3$cam_v) Error in `$<-.data.frame`(`*tmp*`, cam_v, value = c(1L, 1L, 1L, 1L, 1L, : – lancet Feb 13 '19 at 11:58
  • @lancet. I think the length won't match because the first element as you showed earlier is `numeric(0)` . Can you use the original data before any assignment – akrun Feb 13 '19 at 12:00
1

A solution using dplyr

library(dplyr)
df_clean <- df %>% 
  mutate(cam_v = ifelse(!is.na(cam_v), cam_v, 
                               ifelse((othermood_v + rass_v + gcs_v) > 0, 1, 0)))
> df_clean
       id othermood_v rass_v gcs_v cam_v
1  100078           0      0     0     0
2  100079           0      0     0     0
3  100081           0      0     0     0
4  100085           1      1     0     1
5  100087           1      1     0     1
6  100088           1      0     0     1
7  100091           1      1     1     2
8  100094           0      1     0     1
9  100095           1      0     0     1
10 100096           0      0     0     0
11 100098           1      1     1     2
12 100099           0      1     0     1
13 100102           1      0     0     1
14 100103           1      0     0     1
15 100104           1      1     0     2
16 100106           0      0     0     0
17 100108           1      0     0     1
18 100109           1      0     0     1
19 100112           1      0     0     1
20 100113           1      1     1     1
21 100114           1      0     0     1
22 100116           1      0     0     1
23 100117           1      0     0     1
24 100118           0      1     0     1

Data

Generally, it is preferred here to use dput(head(data, 20)) to provide sample data for your code. I used this to transform yours data:

df <- read.table(text =
  "id  othermood_v rass_v  gcs_v   cam_v
  100078  0   0   0   NA
  100079  0   0   0   NA
  100081  0   0   0   NA
  100085  1   1   0   NA
  100087  1   1   0   NA
  100088  1   0   0   NA
  100091  1   1   1   2
  100094  0   1   0   NA
  100095  1   0   0   NA
  100096  0   0   0   NA
  100098  1   1   1   2
  100099  0   1   0   NA
  100102  1   0   0   NA
  100103  1   0   0   NA
  100104  1   1   0   2
  100106  0   0   0   NA
  100108  1   0   0   NA
  100109  1   0   0   NA
  100112  1   0   0   NA
  100113  1   1   1   1
  100114  1   0   0   NA
  100116  1   0   0   NA
  100117  1   0   0   NA
  100118  0   1   0   NA", header = TRUE)
JBGruber
  • 11,727
  • 1
  • 23
  • 45
  • @JBCruber I don't have the right to upvote anybody now. But nice to get different solutions. thanks – lancet Feb 13 '19 at 11:53
0

You were close with your method, You just needed to change how you were doing if else. The below should work:

df$cam_v<-ifelse((df$othermood_v>0|df$rass_v>0|df$gcs_v >0), 1,0) 
MLPNPC
  • 454
  • 5
  • 18
  • thanks for your answer. I tried with it. The replaced value should only go to the cell in the last column "cam_v" i.e. where has NULL. With the answer you offered, I understand the best. But it replaced all the value in the cam_v column. Even I had "1" or "2" label in that column before. I suppose your answer should add a ifelse !NA, but I don't know where to put it. JFI, I have already solved my problem. I just want to learn a little bit more. Thanks! – lancet Feb 13 '19 at 19:01