4

My application involves heavy array operations (e.g. log(1) indexing), thus Data.Vector and Data.Vector.Unboxed are preferred to Data.List. It also involves many set operations (e.g. intersectBy), which however, are not provided by the Data.Vector.

Each of these functions can be implemented like in Data.List in 3-4 lines . Is there any reason they all not implemented with Data.Vector? I can only speculate. Maybe set operations in Data.Vector is discouraged for performance reasons, i.e. intersectBy would first produce the intersection through list comprehension and then convert the list into a Data.Vector?

Causality
  • 1,123
  • 1
  • 16
  • 28

1 Answers1

7

I assume it's missing because intersection of unsorted, immutable arrays must have a worst-case run time of Ω(n*m) without using additional space and Data.Vector is optimized for performance. If you want, you can write that function yourself, though:

import Data.Vector as V

intersect :: Eq a => V.Vector a -> V.Vector a -> V.Vector a
intersect x = V.filter (`V.elem` x)

Or by using a temporary set data structure to achieve an expected O(n + m) complexity:

import Data.HashSet as HS

intersect :: (Hashable a, Eq a) => V.Vector a -> V.Vector a -> V.Vector a
intersect x = V.filter (`HS.member` set)
    where set = HS.fromList $ V.toList x

If you can afford the extra memory usage, maybe you can use some kind of aggregate type for your data, for example an array for fast random access and a hash trie like Data.HashSet for fast membership checks and always keep both containers up to date. That way you can reduce the asymptotic complexity for intersection to something like O(min(n, m))

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • I believe the computation cost of `HS.fromList` is `O(n)` which is not a concern compare to the brute force `O(n*m)`. I am intending to implement intersectBy and I will see how the HashSet or HashMap idea applies. – Causality Apr 01 '13 at 22:52
  • But still, I feel `Data.Vector` could have provided these operators. – Causality Apr 01 '13 at 23:13
  • `fromList` should be about O(n log(n)), since it uses the O(log(n)) `insert` function n times internally. – Chai T. Rex Jul 24 '18 at 20:37
  • @ChaiT.Rex You are right, it's a hash trie not a classic hash table. Construction should still be possible in O(n*W) in theory, but is implemented using repeated insertion as you suggested – Niklas B. Jul 25 '18 at 08:29