Does Clojure provide any builtin way to find the position of a sub-sequence in a given sequence?
2 Answers
Clojure provides a builtin way for easy Java Interop.
(java.util.Collections/indexOfSubList '(a b c 5 6 :foo g h) '(5 6 :foo))
;=> 3

- 26,227
- 1
- 63
- 95
-
Thank you for your answer. That's what I'll use in the end, but I'm usually trying to avoid explicitly calling Java Interop from 'business' code, as I find it to be a little verbose. Thank you nevertheless. – Tudor Vintilescu Mar 05 '13 at 13:14
-
While this may work, be aware that a collection isn't a sequence. – NielsK Mar 05 '13 at 13:49
-
@NielsK Philosophical notions aside, I think you'll find `java.util.List` as a superclass of a `seq` and that the java method is on pair of `java.util.List`s. As such, you could use this on lazy sequences (just be careful not to evaluate an infinite one) `(java.util.Collections/indexOfSubList (range 10) (range 3 7)) ;=> 3`, vectors, sorted-maps, etc. – A. Webb Mar 05 '13 at 14:03
-
That's interesting to know. However, it seems it does work on lazy seqs, but the operation itself doesn't seem to be lazy. I tried both methods on (range 10000000) (range 999998 999999), and got a GC overhead limit on the Collections way, and the normal answer with my find-pos. So there must be more than purely philosophical notions. – NielsK Mar 05 '13 at 14:24
-
@NielsK You are right. That does indeed appear to be trying to realize the entire range into memory when you disable the GC overhead check. – A. Webb Mar 05 '13 at 15:53
A sequence is an abstraction, not a concretion. Certain concretions that you can use through the sequence abstraction have a way to find the position of a subsequence (strings and java collections, for instance), but sequences in general don't, because the underlying concretion doesn't have to have an index.
What you can do however, is create a juxt of the element identity and an index function. Have a look at map-indexed.
Here's a naive implementation that will lazily find the position of (all) the subsequence(s) in a sequence. Just use first or take 1 to find only one:
(defn find-pos
[sq sub]
(->>
(partition (count sub) 1 sq)
(map-indexed vector)
(filter #(= (second %) sub))
(map first)))
=> (find-pos [:a :b \c 5 6 :foo \g :h]
[\c 5 6 :foo])
(2)
=> (find-pos "the quick brown fox"
(seq "quick"))
(4)
Take care that index-based algorithms generally aren't something you would do in a functional language. Unless there are good reasons you need the index in the final result, lavish use of index lookup is considered code smell.

- 6,886
- 1
- 24
- 46