6

Does Clojure provide any builtin way to find the position of a sub-sequence in a given sequence?

octopusgrabbus
  • 10,555
  • 15
  • 68
  • 131
Tudor Vintilescu
  • 1,450
  • 2
  • 16
  • 28

2 Answers2

7

Clojure provides a builtin way for easy Java Interop.

(java.util.Collections/indexOfSubList '(a b c 5 6 :foo g h) '(5 6 :foo))
;=> 3
A. Webb
  • 26,227
  • 1
  • 63
  • 95
  • Thank you for your answer. That's what I'll use in the end, but I'm usually trying to avoid explicitly calling Java Interop from 'business' code, as I find it to be a little verbose. Thank you nevertheless. – Tudor Vintilescu Mar 05 '13 at 13:14
  • While this may work, be aware that a collection isn't a sequence. – NielsK Mar 05 '13 at 13:49
  • @NielsK Philosophical notions aside, I think you'll find `java.util.List` as a superclass of a `seq` and that the java method is on pair of `java.util.List`s. As such, you could use this on lazy sequences (just be careful not to evaluate an infinite one) `(java.util.Collections/indexOfSubList (range 10) (range 3 7)) ;=> 3`, vectors, sorted-maps, etc. – A. Webb Mar 05 '13 at 14:03
  • That's interesting to know. However, it seems it does work on lazy seqs, but the operation itself doesn't seem to be lazy. I tried both methods on (range 10000000) (range 999998 999999), and got a GC overhead limit on the Collections way, and the normal answer with my find-pos. So there must be more than purely philosophical notions. – NielsK Mar 05 '13 at 14:24
  • @NielsK You are right. That does indeed appear to be trying to realize the entire range into memory when you disable the GC overhead check. – A. Webb Mar 05 '13 at 15:53
4

A sequence is an abstraction, not a concretion. Certain concretions that you can use through the sequence abstraction have a way to find the position of a subsequence (strings and java collections, for instance), but sequences in general don't, because the underlying concretion doesn't have to have an index.

What you can do however, is create a juxt of the element identity and an index function. Have a look at map-indexed.

Here's a naive implementation that will lazily find the position of (all) the subsequence(s) in a sequence. Just use first or take 1 to find only one:

(defn find-pos
  [sq sub]
  (->>
    (partition (count sub) 1 sq)
    (map-indexed vector)
    (filter #(= (second %) sub))
    (map first)))

=> (find-pos  [:a :b \c 5 6 :foo \g :h]
                [\c 5 6 :foo])
(2)

=> (find-pos  "the quick brown fox"
                (seq "quick"))
(4)

Take care that index-based algorithms generally aren't something you would do in a functional language. Unless there are good reasons you need the index in the final result, lavish use of index lookup is considered code smell.

NielsK
  • 6,886
  • 1
  • 24
  • 46