I now find that the crude
dataset is part of one of those packages which does allow testing. This shows that removing the carets and dollar signs from the patterns allows a much larger number of items to be found that match the targets:
> sum( sapply(dict, grepl, x=tdm$dimnames$Terms))
[1] 4
> dict2<-c('also', 'told reuters', 'an emergency', 'in world oil')
> sum( sapply(dict2, grepl, x=tdm$dimnames$Terms))
[1] 51
You can see which ones are matches if you use grep. (the results from grepl would be 4 timesd as long as tdm$dimnames$Terms :
> sapply(dict2, grep, x=tdm$dimnames$Terms)
$also
[1] 707 708 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753
[19] 754 1485 1486 2434 2881 2882 2988 2989 3399 3400 3782 3983 5265 5995 6088 6382 6383 6893
[37] 7427 7428 7524 7525 7605
$`told reuters`
[1] 3013 7209 7210
$`an emergency`
[1] 779 780 781 2437 2642 4205
$`in world oil`
[1] 3276
The print method for TDM's is not particularly informative, but you can "explode" the value with dput
to see what is inside:
> dput(tdm[ sapply(dict2, grepl, x=tdm$dimnames$Terms), ] )
structure(list(i = c(1L, 2L, 3L, 8L, 9L, 33L, 3L, 16L, 17L, 20L,
21L, 32L, 3L, 6L, 7L, 22L, 39L, 40L, 3L, 14L, 15L, 36L, 37L,
38L, 3L, 12L, 13L, 27L, 28L, 41L, 3L, 10L, 11L, 25L, 26L, 30L,
3L, 4L, 5L, 23L, 24L, 31L, 3L, 4L, 5L, 23L, 24L, 31L, 3L, 18L,
19L, 29L, 34L, 35L), j = c(6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L,
7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 10L, 10L, 14L, 14L, 14L, 14L, 14L, 14L, 16L, 16L,
16L, 16L, 16L, 16L, 17L, 17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L,
18L, 18L, 18L), v = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
nrow = 41L, ncol = 20L, dimnames = structure(list(Terms = c("ali also",
"ali also delivered", "also", "also called", "also called for",
"also contributed", "also contributed to", "also delivered",
"also delivered \"a", "also denied", "also denied that",
"also nigerian", "also nigerian oil", "also no", "also no projection",
"also reviews", "also reviews the", "also was", "also was lowered",
"but also", "but also reviews", "european weekend also",
"group, also", "group, also called", "he also", "he also denied",
"is also", "is also nigerian", "louisiana sweet also", "meeting.\" he also",
"private group, also", "sector, but also", "sheikh ali also",
"sweet also", "sweet also was", "there was also", "was also",
"was also no", "weekend also", "weekend also contributed",
"who is also"), Docs = c("127", "144", "191", "194", "211",
"236", "237", "242", "246", "248", "273", "349", "352", "353",
"368", "489", "502", "543", "704", "708")), .Names = c("Terms",
"Docs"))), .Names = c("i", "j", "v", "nrow", "ncol", "dimnames"
), class = c("TermDocumentMatrix", "simple_triplet_matrix"), weighting = c("term frequency",
"tf"))