If you want to find the unique elements in a vector that match a given vector you can use %Iin%
to test for the presence of your 'pattern' within the larger vector. The operator, %in%
, returns a logical vector. Passing that output to which()
returns the index of each TRUE
value which can be used to subset the larger vector to return all of the elements that match the 'pattern', regardless of order. Passing the subset vector to unique()
eliminates duplicates so that there is only one occurence of each element from the larger vector that matches the elements and length of the 'pattern' vector.
For example:
> num.data <- c(1, 10, 1, 6, 3, 4, 5, 1, 2, 3, 4, 5, 9, 10, 1, 2, 3, 4, 5, 6)
> num.pattern.1 <- c(1,6,3,4,5)
> num.pattern.2 <- c(1,2,3,4,5)
> num.pattern.3 <- c(1,2,3,4,6)
> unique(num.data[which(num.data %in% num.pattern.1)])
[1] 1 6 3 4 5
> unique(num.data[which(num.data %in% num.pattern.2)])
[1] 1 3 4 5 2
> unique(num.data[which(num.data %in% num.pattern.3)])
[1] 1 6 3 4 2
Notice that the first result matches the order of num.pattern.1
by coincidence. The other two vectors do not match the order of the pattern vectors.
To find the exact sequence within num.data
that matches the patterns you can use something similar to the following function:
set.seed(12102015)
test.data <- sample(c(1:99), size = 500, replace = TRUE)
test.pattern.1 <- test.data[90:94]
find_vector <- function(test.data, test.pattern.1) {
# List of all the vectors from test.data with length = length(test.pattern.1), currently empty
lst <- vector(mode = "list")
# List of vectors that meet condition 1, currently empty
lst2 <- vector(mode = "list")
# List of vectors that meet condition 2, currently empty
lst3 <- vector(mode = "list")
# A modifier to the iteration variable used to build 'lst'
a <- length(test.pattern.1) - 1
# The loop to iterate through 'test.data' testing for conditions and building lists to return a match
for(i in 1:length(test.data)) {
# The list is build incrementally as 'i' increases
lst[[i]] <- test.data[c(i:(i+a))]
# Conditon 1
if(sum(lst[[i]] %in% test.pattern.1) == length(test.pattern.1)) {lst2[[i]] <- lst[[i]]}
# Condition 2
if(identical(lst[[i]], test.pattern.1)) {lst3[[i]] <- lst[[i]]}
}
# Remove nulls from 'lst2' and 'lst3'
lst2 <- lst2[!sapply(lst2, is.null)]
lst3 <- lst3[!sapply(lst3, is.null)]
# Return the intersection of 'lst2' and 'lst3' which should be a match to the pattern vector.
return(intersect(lst2, lst3))
}
For reproducibility I used set.seed()
and then created a test data set and pattern. The function find_vector()
takes two arguments: first, test.data
that is the larger numerical vector you wish to check for pattern vectors and second, test.pattern.1
that is the shorter numerical vector you wish to find in test.data
. First, three lists are created: lst
to hold test.data
divided into smaller vectors of length equal to the length of the pattern vector, lst2
to hold the pattern vectors from lst
that satisfy the first condition, and lst3
to hold from lst
the vectors that satisfy the second condition. The first condition tests that the elements of the vectors in lst
are in the pattern vector. The second condition tests that the vector from lst
matches the pattern vector by order and by element.
One problem with this approach is that NULL
values are introduced into each list when the conditions are not satisfied, but the process stops when the conditions are satisfied. For reference you may print the lists to see all the vectors tested, the vectors that meet the first condition, and the vectors that meet the second condition. The nulls can be removed. With the nulls removed, finding the intersection of lst2
and lst3
will reveal the pattern matched identically in test.data
.
To use the function make sure to explicitly define test.data <- 'a numeric vector'
and test.pattern.1 <- 'a numeric vector'
. No special packages are needed. I didn't do any benchmarking, but the function appears to work fast. I also did not look for scenarios where the function would fail.