Clojure sub-sequence position in sequence

Question

Does Clojure provide any builtin way to find the position of a sub-sequence in a given sequence?

score 7 · Accepted Answer · answered Mar 05 '13 at 13:05

7

Clojure provides a builtin way for easy Java Interop.

(java.util.Collections/indexOfSubList '(a b c 5 6 :foo g h) '(5 6 :foo))
;=> 3

answered Mar 05 '13 at 13:05

A. Webb

26,227
1
63
95

Thank you for your answer. That's what I'll use in the end, but I'm usually trying to avoid explicitly calling Java Interop from 'business' code, as I find it to be a little verbose. Thank you nevertheless. – Tudor Vintilescu Mar 05 '13 at 13:14
While this may work, be aware that a collection isn't a sequence. – NielsK Mar 05 '13 at 13:49
@NielsK Philosophical notions aside, I think you'll find `java.util.List` as a superclass of a `seq` and that the java method is on pair of `java.util.List`s. As such, you could use this on lazy sequences (just be careful not to evaluate an infinite one) `(java.util.Collections/indexOfSubList (range 10) (range 3 7)) ;=> 3`, vectors, sorted-maps, etc. – A. Webb Mar 05 '13 at 14:03
That's interesting to know. However, it seems it does work on lazy seqs, but the operation itself doesn't seem to be lazy. I tried both methods on (range 10000000) (range 999998 999999), and got a GC overhead limit on the Collections way, and the normal answer with my find-pos. So there must be more than purely philosophical notions. – NielsK Mar 05 '13 at 14:24
@NielsK You are right. That does indeed appear to be trying to realize the entire range into memory when you disable the GC overhead check. – A. Webb Mar 05 '13 at 15:53

NielsK · Answer 2 · 2013-03-05T13:58:40.957

A sequence is an abstraction, not a concretion. Certain concretions that you can use through the sequence abstraction have a way to find the position of a subsequence (strings and java collections, for instance), but sequences in general don't, because the underlying concretion doesn't have to have an index.

What you can do however, is create a juxt of the element identity and an index function. Have a look at map-indexed.

Here's a naive implementation that will lazily find the position of (all) the subsequence(s) in a sequence. Just use first or take 1 to find only one:

(defn find-pos
  [sq sub]
  (->>
    (partition (count sub) 1 sq)
    (map-indexed vector)
    (filter #(= (second %) sub))
    (map first)))

=> (find-pos  [:a :b \c 5 6 :foo \g :h]
                [\c 5 6 :foo])
(2)

=> (find-pos  "the quick brown fox"
                (seq "quick"))
(4)

Take care that index-based algorithms generally aren't something you would do in a functional language. Unless there are good reasons you need the index in the final result, lavish use of index lookup is considered code smell.

Clojure sub-sequence position in sequence

2 Answers2