data frame search == not finding all conditions that hold

Question

I am trying to conditionally replace some fields in a dataframe; however, my code is finding about 25% of the actual instances present. I've searched through the other conditional search questions, but didn't find anything matching my problem -- I apologize in advance if I missed one.

Specifically, I am trying to replace all numbers 1 to 9 in dta$day, with a to i.

Here are the first 100 items in that vector: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9

When I conditionally search for values 1 to 9, using:

dta$day == c("1","2","3","4","5","6","7","8","9")

It states that only the first and last set in that grouping match my condition as below (I've bolded ~what should be TRUE for your reference):

[1] **TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE **FALSE**
[33] **FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE **FALSE FALSE**
[65] **FALSE FALSE FALSE FALSE FALSE FALSE FALSE** FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  **TRUE  TRUE  TRUE  TRUE  TRUE  TRUE**
[97]  **TRUE  TRUE  TRUE**

The problem must be in that first step, but to show you the result, only the first and last set in that first 100 in my vector are appropriately replaced after applying this code:

dta[dta$day == c("1","2","3","4","5","6","7","8","9"),1
] <- c("a", "b", "c", "d", "e", "f", "g", "h", "i")

[1] **"a"  "b"  "c"  "d"  "e"  "f"  "g"  "h"  "i"**  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
 [20] "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" **"1"  "2"  "3"  "4"  "5"  "6"  "7"** 
 [39] "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
 [58] "27" "28" **"1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"** "11" "12" "13" "14" "15" "16" "17"
 [77] "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" **"a"  "b"  "c"  "d"  "e" 
 [96] "f"  "g"  "h"  "i"**

If useful, here is the initial state of that vector:

is.numeric(dta$day)

[1] TRUE

summary(dta$day) 

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
1.00    8.00   16.00   15.73   23.00   31.00

I am reproducing the data frame here:

day <- c(1:31,1:28,1:31,1:30)
month <- c(rep_len(1,31),rep_len(2,28),rep_len(3,31),rep_len(4,30))
temp <- rnorm(length(month),10,10)
dta=as.data.frame(cbind(day,month,temp))

And actually, although I am able to reproduce the problem with this toy example, I get a warning that I do not get with my actual data (not reproduced here because it is very large): "longer object length is not a multiple of shorter object length".

I would love some help, and if I haven't provided something or haven't done so in the format needed, please kindly let me know!

It's a bit hard to follow your example. Perhaps you could demonstrate with only 20 entries rather than 100? Clearly show the input and the desired output. I'm not sure I completely understand what you want in the end. — MrFlick, May 07 '18 at 18:04

score 4 · Answer 1 · answered May 07 '18 at 18:11

4

It looks like you're checking equivalence to a vector, rather than it's components. Try %in% instead, like this:

dta[dta$day %in% c("1","2","3","4","5","6","7","8","9"), ]

answered May 07 '18 at 18:11

C-x C-c

1,261
8
20

Thank you! That was exactly the problem! – Hannah May 08 '18 at 20:22

rg255 · Answer 2 · 2018-05-07T18:38:25.063

0

Use %in% rather than == and then index your data frame/vector as below to replace 1:9 with a:i as wanted:

y <- c(1:9)
dta$day[dta$day %in% y] <- letters[1:length(y)]

Read more about the different behaviours of these operators here:

Difference between the == and %in% operators in R

And

Difference between `%in%` and `==`

edited May 07 '18 at 18:38

answered May 07 '18 at 18:31

rg255

4,119
3
22
40

Thank you for the references and the much more elegant line of code! (I gave you an up vote, but unfortunately, my reputation is too low for it to be visible.) – Hannah May 08 '18 at 20:28

data frame search == not finding all conditions that hold

2 Answers2