3

I have a function that selects Cartesian products of lists such that the number of duplicate elements is highest:

import Data.List (nub)

f :: Eq a => [[a]] -> [[a]]
f xss = filter ((==) minLength . length . nub) cartProd
  where
    minLength = minimum (map (length . nub) cartProd)
    cartProd = sequence xss

For example:

*Main> f [[1,2,3],[4],[1,5]]
[[1,4,1]]

But:

*Main> sequence [[1,2,3],[4],[1,5]]
[[1,4,1],[1,4,5],[2,4,1],[2,4,5],[3,4,1],[3,4,5]]

Is there a name for the property that the result of my function f has?

marc_r
  • 160
  • 7
  • 1
    I'm not sure, but in this case you could also probably say something about "cardinality", e.g. "the sequence whose alphabet has lowest cardinality". But I guess that's worse than "contains the most duplicates"... – jberryman Feb 13 '17 at 19:07
  • 2
    I wrote an answer proposing "multiplicity", but then realized I don't really understand the question. (e.g. does `[1,4,1,4]` have more "duplicates" than `[1,1,1,4]`? Why (not)?) Even so you might enjoy reading about that word on e.g. Wikipedia to see if it sounds relevant to you. – Daniel Wagner Feb 13 '17 at 19:10
  • @jberryman In a way that's actually better, since it positively emphasizes the property I am after. When you said "alphabet" I also realized that one could speak in terms of operations on the list. I might want to use the term "projection". – marc_r Feb 13 '17 at 20:51
  • @DanielWagner "Multiplicity" sounds catchy. If I interpret my lists as multisets, it's a very close fit I think. `[1,4,1,4]` is as good as `[1,1,1,4]` (their `nub`s have the same length), so I would be after something like "total multiplicity". More specifically about the why: I am using this in the generation of certain graphs in which I want to merge as many identical nodes as possible. – marc_r Feb 13 '17 at 21:01
  • "homogeneity" ? Or entropy, like defined by Shannon, if you consider a list as the possible values of a variable. – Jean-Baptiste Potonnier Feb 14 '17 at 00:30
  • @Jean-BaptistePotonnier `[1,4,1,4]` and `[1,1,1,4]` would have different entropies, although I consider them equal. – marc_r Feb 14 '17 at 14:14

1 Answers1

2

I believe your function is computing a minimum set cover:

Given a set of elements { 1 , 2 , . . . , n } (called the universe) and a collection S of sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of S whose union equals the universe.

In your case, n is length xss. There is one set in S for each distinct element x of concat xss, namely the set { i | x `elem` (xss !! i) } of all indices that x occurs in. The minimum set cover then tells you which x to choose from each list in xss (sometimes giving you multiple choices; any choice will produce the same final nubbed length).

Here is a worked example for your [[1,2,3],[4],[1,5]]:

The universe is {1,2,3}.

There are five sets in the collection S; I'll name them S_1 through S_5:

  • S_1 = {1,3} because the first and third lists contain 1.
  • S_2 = {1} because the first list contains 2.
  • S_3 = {1} because the first list contains 3.
  • S_4 = {2} because the second list contains 4.
  • S_5 = {3} because the third list contains 5.

A minimum set cover for this is {S_1, S_4}. Because this is a set cover, this means every list contains either 1 or 4. Because it is minimal, no other choice of sets produces a smaller final collection of values. So, we can choose either 1 or 4 from each list to produce a final answer. As it happens, no list contains both 1 and 4 so there is only one choice, namely [1,4,1].

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380
  • Brilliant. From the Wikipedia link I understand that my function is actually solving the **hitting set problem** which is achieved by interchanging sets and universe from the minimum-set-cover problem. Which is exactly what you're doing here. – marc_r Feb 13 '17 at 23:51