4

I would like to do set difference between 2 integer lists, which allow repetitions in haskell.

So in case of having [1,2,1,4,3] [1,2,4], the difference would be [1,3]

Currently I can do it via the normal \\ operator listA \\ listB.

But the problem is this too slow. As integers are in ord group it can be done faster.

I am aware of the Data.Multiset module, which would do it faster, but is there a native way to do it on lists without the Data.Multiset module?

  • Maybe you should consider Data.IntSet. – augustss Jul 02 '15 at 11:35
  • 1
    The problem with the Set structure is that it does not hold repetition count. For instance if we have in the first list 4 ones, and in the second 2 ones. The set difference will be [1], as there are more ones in the first list, than in the other. –  Jul 02 '15 at 16:52
  • 1
    You seem to be against using library code, for some reason, but [Data.List.Ordered](http://hackage.haskell.org/package/data-ordlist-0.4.7.0/docs/Data-List-Ordered.html) has set- and bag-like operations on sorted lists, including [minus](http://hackage.haskell.org/package/data-ordlist-0.4.7.0/docs/Data-List-Ordered.html#v:minus). – Daniel Wagner Jul 02 '15 at 16:59
  • This is part of a problem, published on a a competitive programming site, which does not include modules such as MultiSet and List.Ordered –  Jul 02 '15 at 17:48

2 Answers2

6

As integers are in ord group it can be done faster.

Yes, but it requires building a sorted index. And that's pretty much what Data.Multiset is doing. You could of course write an operation that does that manually, but you'd be effectively reimplementing Multiset by then.

Bartek Banachewicz
  • 38,596
  • 7
  • 91
  • 135
1

As I could not use Data.MultiSet I needed to imeplement my own solution via lists, so I will post the solution.

In order to get the set difference of two lists with repetitons, we first need to group the numbers together, to get the amount of repetitions of each number:

tupleGroup [] _ = []
tupleGroup (hd:tl) (x, y) 
  | hd == x = (x, succ y) : tupleGroup tl (x, succ y)
  | otherwise = (hd, 1) : tupleGroup tl (hd, 1)

group' [] = []
group' (hd:tl) = tupleGroup tl (hd, 1) 

In order to call the helper function group' we first need to sort the list, as tupleGroup needs a sorted list.

Next function is the difference function, in which we supply the grouped lists:

difference [] [] = []
difference (hd:tl)  [] = fst hd : difference tl []
difference [] (hd:tl) = []
difference (hd1:tl1) (hd2:tl2) 
  | x1 == x2 && y1 > y2 = x1 : difference tl1 tl2
  | x1 == x2 = difference tl1 tl2
  | x1 < x2 = x1 : difference tl1 (hd2:tl2)
  | otherwise = difference (hd1:tl1) tl2
  where 
    x1 = fst hd1
    x2 = fst hd2
    y1 = snd hd1
    y2 = snd hd2

The function will return a list of all numbers, which the repetition count is bigger in the first list, than in the second one.