Starting with your code:
set.seed(2)
x <- paste(sample( c('e','l','i','r'), 50, replace=TRUE, prob=c(0.6,0.1, 0.1, 0.2) ),collapse = '')
x
# [1] "ereelleieeeereeileeereieeeeeleeeiieriereleeelrleei"
We can easily replicate this with:
set.seed(2)
xmany <- replicate(5000, paste(sample( c('e','l','i','r'), 50, replace=TRUE, prob=c(0.6,0.1, 0.1, 0.2) ),collapse = ''))
head(xmany)
# [1] "ereelleieeeereeileeereieeeeeleeeiieriereleeelrleei"
#4# ^ ^ ^ ^
# [2] "eerleirlrrrireieeeeeeeeeereieeereeeilereleeeeeeeee"
#1# ^
# [3] "eelieeieeeereeiiieleeeliereereelelereieeeereerreee"
#5# ^ ^ ^ ^ ^
# [4] "eelereieeeilerereleeleeiereerelelreiereeeeleeeeeee"
#6# ^ ^ ^ ^ ^ ^
# [5] "irrleieeeeleirleeeeeeleerilerireieieeeeeieerlleeee"
#2# ^ ^
# [6] "reeereeeerrereirerieiliereleeeelrreleereeerereeeee"
#3# ^ ^ ^
I've added the text to highlight the occurrences of "el"
in each string.
If you need the number of occurrences of "el"
within each string, then (without the head
for everything):
ispos <- function(a) a > 0
head( lengths(Filter(ispos, gregexpr("el", xmany))) )
# [1] 4 1 5 6 2 3
Note: I created the ispos
function because gregexpr
will return -1 when no matches are made, which keeps the returned vector at length 1 or more. So by removing the negative elements, we get an honest return. (I could have used regmatches(gregexpr(...),xmany)
, but that seems like a lot more work than is necessary to get the number of occurrences.)
If you need the frequency table for it:
table( lengths(Filter(ispos, gregexpr("el", xmany))) )
# 0 1 2 3 4 5 6 7 8 9
# 9701 891 1145 1241 936 459 178 69 16 1