R: Find identical strings in two lists

Question

I have two lists of strings and want to find which strings are in both lists.

I tried converting the lists to vectors so that I could use intersect or setequal but that converted all the strings to numbers and (apologies if there's an obvious answer I can't figure out), I can't seem to convert the lists without that happening.

What's the best way forward?

EDIT: I have these data frames:

dput(s)
structure(list(V1 = structure(c(3L, 2L, 1L, 4L), .Label = c("24d2afb212410711de0e237e5435e104", 
"2a3d9ca791a579a14883de538a012e24", "a90b03209a8095ec406809d89d5035c3", 
"f271eb38cc409c6bfe9dcf2bfcab8471"), class = "factor")), .Names = "V1", row.names = c(NA, 
-4L), class = "data.frame")

dput(r)
structure(list(V1 = structure(c(2L, 1L, 4L, 3L), .Label = c("24d2afb212410711de0e237e5435e104", 
"2a3d9ca791a579a14883de538a012e24", "7320e2e921df862968954d4b60e2a80a", 
"a9f47ec7c488d2bcddf2c1adc2bf6305"), class = "factor")), .Names = "V1", row.names = c(NA, 
-4L), class = "data.frame")

I want to find the strings that are in both, i.e.

2a3d9ca791a579a14883de538a012e24 and 24d2afb212410711de0e237e5435e104.

as.character() doesn't work for preserving those strings; is there something else that would work for converting them into factors or is there another operation that would work better?

Are these `lists` of equal length? Check the `str` of the `list` for the `class` of the elements of `list`? Please provide a small reproducible example and expected output. — akrun, May 04 '16 at 20:06
Sounds like they are factors. Try `as.character()` with them and try your code again. — cory, May 04 '16 at 20:10
Added more info. I tried as.character() and it completely removed most of the characters. — Michael, May 04 '16 at 21:30

Sotos · Answer 1 · 2016-05-05T06:08:54.393

0

You need to specify the columns too within your data frames.

With intersect,

intersect(r$V1, s$V1)
#[1] "2a3d9ca791a579a14883de538a012e24" "24d2afb212410711de0e237e5435e104"

With grep,

unlist(sapply(r$V1, function(i)grep(i, s$V1, value = TRUE)))
#[1] "2a3d9ca791a579a14883de538a012e24" "24d2afb212410711de0e237e5435e104"

edited May 05 '16 at 06:08

answered May 04 '16 at 21:34

Sotos

51,121
6
32
66

Thanks, Sotos, but when I try to use intersect, I get an empty data frame. When I use grep, this is the message I get (and there are no results): `Warning message: In grep(i, s, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used` – Michael May 04 '16 at 21:53
hmm, can you share `str()` of your lists? `sapply` should have taken care of that warning – Sotos May 04 '16 at 21:55
str(s): `'data.frame': 40909 obs. of 1 variable: $ V1: Factor w/ 33212 levels "0000734ec06af49a2477e8c044fe51fc",..: 16139 15105 13860 2836 20143 29594 21391 15141 11485 32833 ...` str(r): `'data.frame': 1091319 obs. of 1 variable: $ V1: Factor w/ 1090874 levels "00002df85d03395bdeb018c35ef8ede8",..: 1080008 717274 1041734 720889 167066 968941 328961 807515 89060 817033 ...` – Michael May 04 '16 at 22:27
@Michael, I edited my answer. Now it should work. The problem was not the factors but the fact that you need to specify the column also not just the data frame (i.e. `r$V1` and `s$V1`) – Sotos May 05 '16 at 06:10

R: Find identical strings in two lists

1 Answers1