0

I'm trying to replace NA values in a column in a data frame with the value from another column in the same row. Instead of replacing the values the entire column seems to be deleted.

fDF is a data frame where some values are NA. When column 1 has an NA value I want to replace it with the value in column 2.

fDF[columns[1]] = if(is.na(fDF[columns[1]]) == TRUE & 
                     is.na(fDF[columns[2]]) == FALSE) fDF[columns[2]]

I'm not sure what I'm doing wrong here.

Thanks

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Greg
  • 237
  • 2
  • 11
  • Two answers here and still didn't see any explanation for why not to use `if` in this situation. So in short, `if` statement can only accept a condition of length 1 (it *can* accept a longer exprssion, but it will ignore the rest of it). Thus, you can't ask it to go thru each element of a vector unless wrapping it into some type of a loop (Try `if(1:3 == 2) 1` for example, and read the warning message). Thus, `ifelse` is to be used because it can deal with a vector input of length >1 – David Arenburg Nov 16 '14 at 10:19
  • Thank you very much. I understand why I should use `ifelse` instead. – Greg Nov 16 '14 at 10:38
  • @DavidArenburg: I already explicitly tagged it ***'vectorization'*** when I added my answer, that addressed your point. The question doesn't even ask "Why is an if() statement in R not vectorized?" anyway. I don't see the point in scolding people for not answering a different question which was never asked, and which we both know is already well-answered on SO. If you want to go ahead and answer the question which was never asked, then do. – smci Nov 16 '14 at 21:42
  • @smci, So how do you read *I'm not sure what I'm doing wrong here* – David Arenburg Nov 16 '14 at 21:52
  • The thing being asked for is a working solution, not why the current approach is wrong. If OP had instead asked "Please explain why if() does not work with vector arguments", which they didn't, then an explanation of that would have been on-topic. If you still want to disagree, let's take it to [chat] – smci Nov 16 '14 at 22:00
  • @smci the body of the question implies that the OP wants to know what he is doing wrong. The fact that you added some unnessacery tags (like you always do) has nothing to do here. I can agree that we can assume that the OP is looking for a solution, but I think that before providing it, one should add some explanations rather than just saying "which doesn't make any sense". Either way, I think I made my point clear and I don't see any reason to further discuss this. – David Arenburg Nov 16 '14 at 22:10

2 Answers2

0

You want an ifelse() expression:

fDF[columns[1]] <- ifelse(is.na(fDF[columns[1]]), fDF[columns[2]], fDF[columns[1]])

not trying to assign the result of an if statement to a vector, which doesn't make any sense.

[EDIT only for David Arenburg: if that wasn't already explicit enough, in R if statements are not vectorized, hence can only handle scalar expressions, hence they're not what the OP needed. I had already tagged the question 'vectorization' yesterday and the OP is free to go read about vectorization in R in any of the thousands of good writeups and tutorials out there.]

smci
  • 32,567
  • 20
  • 113
  • 146
  • Thank you. I also just realized I was replacing the entire column rather than just the NA values. Is there a way to replace only certain values with values from another column? – Greg Nov 16 '14 at 09:58
0

You can adjust following code to your data:

> ddf
   xx yy    zz
1   1 10 11.88
2   2  9    NA
3   3 11 12.20
4   4  9 12.48
5   5  7    NA
6   6  6 13.28
7   7  9 13.80
8   8  8 14.40
9   9  5    NA
10 10  4 15.84
11 11  6 16.68
12 12  6 17.60
13 13  5 18.60
14 14  4 19.68
15 15  6    NA
16 16  8 22.08
17 17  4 23.40
18 18  6 24.80
19 19  8    NA
20 20 11 27.84
21 21  8 29.48
22 22 10 31.20
23 23  9 33.00
> 
> 
> idx = is.na(ddf$zz)
> idx
 [1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
[22] FALSE FALSE
> 
> ddf$zz[idx]=ddf$yy[idx]
> 
> ddf
   xx yy    zz
1   1 10 11.88
2   2  9  9.00
3   3 11 12.20
4   4  9 12.48
5   5  7  7.00
6   6  6 13.28
7   7  9 13.80
8   8  8 14.40
9   9  5  5.00
10 10  4 15.84
11 11  6 16.68
12 12  6 17.60
13 13  5 18.60
14 14  4 19.68
15 15  6  6.00
16 16  8 22.08
17 17  4 23.40
18 18  6 24.80
19 19  8  8.00
20 20 11 27.84
21 21  8 29.48
22 22 10 31.20
23 23  9 33.00
> 
rnso
  • 23,686
  • 25
  • 112
  • 234