2

UPDATE: @bphelpsjr answer provides what I am looking for. Unfortunately someone down-voted him and I do not have the rep to up-vote. I am marking his response as the answer.

This is extremely long winded but I wanted to provide as much detail as possible.

Essentially, I want to take a set of data and generate a list of lists based on rules (defined below). This is essentially a filtered version of a powerset.

I will then store these results for repeated use (similar to that of a rainbow table) to avoid constant calculation of the same N. I will then use variable substitution (e.g., A = 18, B = 30) before applying other logic (not described below, not necessary for my question).

Here are two input options I've experimented with in attempting to create a solution. You could also use numbers instead of letters.

Input Option #1

var completeList = new List<Item>
        {
            new Item('A', 'A'),
            new Item('A', 'B'),
            new Item('A', 'C'),
            new Item('A', 'D'),            
            new Item('B', 'B'),
            new Item('B', 'C'),
            new Item('B', 'D'),           
            new Item('C', 'C'),
            new Item('C', 'D'),            
            new Item('D', 'D')
        };

Input Option #2

List<Item> aList = new List<Item> 
{
        new Item('A', 'A'),
        new Item('A', 'B'),
        new Item('A', 'C'),
        new Item('A', 'D'),            
    };

    List<Item> bList = new List<Item> 
    {
        new Item('B', 'B'),
        new Item('B', 'C'),
        new Item('B', 'D'),           
    };

    List<Item> cList = new List<Item> 
    {
        new Item('C', 'C'),
        new Item('C', 'D'),            
    };

    List<Item> dList = new List<Item> 
    {
        new Item('D', 'D')
    };

Desired Output

AA BB CC DD
AA BB CD
AA BC    DD
AA BD CC
AB    CC DD 
AB    CD
AC BB    DD
AC BD
AD BB CC
AD BC

Rules

The first 3 are definitive rules while the 4th is more a desire.

  1. Solution must be able to handle N number of distinct letters and lists of items

  2. Every distinct letter must appear at least once in the list of items. Example:

    AA BB CC DD <-- Valid

    AA BB CC <-- invalid, does not contain D

  3. Letters may only repeat within a given item. Example:

    AA BB CC DD <-- valid

    AA BA CC DD <-- invalid, A is repeated in a different item

  4. The logic must contain as much "aggressive filtering" and short circuiting as possible in order to cut down on the number of iterations that it will perform. I had a working left-shift solution but it does not scale whatsoever due to the (my?) inability to incorporate the filtering and short circuiting. This basically resulted in iterating through the entire powerset.

    • Example: Once a letter is found that is already contained within a potential list's items, move on to the next potential combination because this one is invalid.

    • Example: Once a valid list of items has been found, start the next round.

The next two are potential examples based on the way I currently have the data set grouped by the first letter of each item. They may not be applicable depending on what type of solution you're creating.

  • Potential Example: If an item contains a letter that is in the next list's items, skip that entire list and move to the next list of items.

    AA BC DD <-- We can skip the C list because it is covered by BC

  • Potential Example: Once you have exhausted a list's potential candidates (e.g., the last list will only ever have 1 item), you shouldn't (if my thinking is correct) need that list again until the list above it + 1 has changed items.

    AA BB CC DD <-- after you find this, stop searching the list containing DD until you get to BC (list above DD + 1)

    AA BB CD

    AA BC DD <-- We need DD again

    1. No list of items should repeat itself, regardless of the order of items. Example:

    AA BB CC DD == BB AA DD CC so do not include BB AA DD CC

Assumptions I've made:

  • It will be easier to group the sets by their initial starting letter (see sample data below). If this is not the optimal approach, it is not an issue.

Below is the Item class I use to hold my data simply for convenience:

public class Item
{
    public char First { get; set; }
    public char Second { get; set; }

    public Item(char first, char second)
    {
        First = first;
        Second = second;

    public List<char> DistinctCharacters()
    {
        return First == Second ? new List<char> { First } : new List<char> { First,  Second };
    }
}
Dotarp
  • 88
  • 5
  • I'm afraid this makes no sense to me. I can't see a connection between input and output. The 'rules' appear to be constraints on how the problem is to be solved, but there's no explanation of the problem itself. – Baldrick Mar 13 '14 at 01:12
  • My apologies if it wasn't clear. I will edit to try and be more specific. – Dotarp Mar 13 '14 at 01:14
  • I have edited and rearranged some things to try and be more clear. Please let me know if you have any questions. – Dotarp Mar 13 '14 at 01:23
  • Ok, it makes sense what you're trying to do now! Thanks for the clarification. – Baldrick Mar 13 '14 at 01:27
  • Do you have any thoughts on the best way to solve or approach this? – Dotarp Mar 13 '14 at 17:54
  • How big can N be? I assume that effectively that's the only input? How are you going to use the output? (I suspect that the number of combinations will blow up very quickly, so some sort of iterator-based solution seems natural...) – Jon Skeet Mar 24 '14 at 07:08
  • The limit of N has not been defined but for this question we can assume a limit of 50. Ideally N would be limitless as far as the solution is concerned though. You are correct, it is effectively the only input. You are also correct in that I will iterate over the final result set, performing various, small calculations along the way. – Dotarp Mar 25 '14 at 01:02
  • Okay. I'm still not sure of an answer, but at least we can have a pretty simple and compact representation of which letters have been used. I assume we can actually just use `N` as the input, rather than lists - and just assume that every letter pair will be present in the obvious way. – Jon Skeet Mar 25 '14 at 06:17
  • @JonSkeet - Correct, you could just use N as the input rather than lists. I'm not sure what assumption you're looking to make with the letter pairs but the input data set I presented has already had other logic applied to it. For example, in order to get that data set we take combination(N,2), exclude commutative duplicates (e.g., include AB exclude BA), and allow single letter repeats (e.g., AA, BB). I hope the last part of this response doesn't just add confusion. Like I said, I'm not entirely sure what assumption you're trying to make. – Dotarp Mar 25 '14 at 15:16
  • That's fine - that's what I'd been assuming, basically. – Jon Skeet Mar 25 '14 at 15:52
  • @JonSkeet I added a blurb in the narrative but wanted to directly call your attention to the fact. After I have the results of a `N` calculation, I'm going to store this so I never have to calculate it again. I will then use variable substitution (e.g., A = 18, B = 30) before applying my other logic. – Dotarp Mar 26 '14 at 16:52
  • Okay. I'm not sure that it *will* particularly be more efficient, but we can work that out later. I haven't actually got a solution yet. – Jon Skeet Mar 26 '14 at 16:53
  • @JonSkeet Just as another FYI - I have added another note that a fellow developer thought might be useful to others: You could also use numbers instead of letters for the input (e.g., 00, 01, 02, 03 instead of AA, AB, AC, AD). – Dotarp Mar 27 '14 at 19:52
  • @Dotarp: Ultimately that's the simple bit of the problem. I've worked out the general basis, but it's going to be a fiddly thing to implement... it'll probably take about an hour. I'll give it a try if I can find time, but I don't know when that will be. – Jon Skeet Mar 27 '14 at 19:53
  • @JonSkeet I agree. Just wanted to pass it along in case your approach differed from any of the ideas I've had. Thanks again. – Dotarp Mar 27 '14 at 21:27

3 Answers3

1

This should work (using numbers instead of letters):

    private static BlockingCollection<List<int[]>> GetCombinations(int toConsider)
    {
        var allResults = new BlockingCollection<List<int[]>>();
        var possibilities = Enumerable.Range(0, toConsider).ToList();
        Parallel.ForEach(possibilities, possibility =>
        {
            GetIteratively(new List<int[]> { new[] { 0, possibility } }, allResults, possibilities.RemoveAllClone(x => x == 0 || x == possibility));
        });
        return allResults;
    }
    public static void GetIteratively(List<int[]> result, BlockingCollection<List<int[]>> allResults, List<int> possibilities)
    {
        Stack<Tuple<List<int[]>, List<int>>> stack = new Stack<Tuple<List<int[]>, List<int>>>();
        stack.Push(new Tuple<List<int[]>,List<int>>(result, possibilities));
        while (stack.Count > 0)
        {
            var pop = stack.Pop();
            if (pop.Item2.Count > 0)
                pop.Item2.ForEach(x => stack.Push(new Tuple<List<int[]>, List<int>>(new List<int[]>(result) { new int[] { pop.Item2[0], x } }, pop.Item2.RemoveAllClone(y => (y == pop.Item2[0] || y == x)))));
            else
                allResults.Add(result);
        }   
    }

And here is the LinqExtension for RemoveAllClone

    public static List<T> RemoveAllClone<T>(this IEnumerable<T> source, Predicate<T> match)
    {
        var clone = new List<T>(source);
        clone.RemoveAll(match);
        return clone;
    }
bphelpsjr
  • 36
  • 1
  • 1
    This is perfect! I'm not sure who down-voted you, but this does exactly what I asked for. I don't have enough rep to up-vote your answer, sorry. Additionally, I realize that, unless you have the time and resources (memory and cpu), setting `toConsider` aka `N` to a large number isn't really feasible but I was able to set yours to 17 and get accurate results. Thanks again! – Dotarp Apr 08 '14 at 01:49
0

I do not have enough rep to comment so I am posting an incomplete answer. I have a solution but havent refined it. It currently spits out incomplete combinations (eg AD CC) and could use some pruning to avoid looking at useless lists.

My approach is recursive, but avoids some computations by storing solutions. For example, the combinations remaining when looking at the C list, having used the A and B letters are the same whether the combination so far is AA BB or AB.

I have not implemented the Memorize() and IKnowThis() methods but they should be straightforward using hashtables.

foreach (var combo in GenerateCombinations("", 0))   
{
    Console.WriteLine(combo);
}

private static List<string> GenerateCombinations(string used, int listIndex)
    {
        if (listIndex >= _allLists.Count || used.Length == _allLists.Count)
            return new List<string>();

        List<string> combos;

        if (!IKnowThis(used, listIndex, out combos))
        {
            if (used.Contains(_allLists[listIndex][0].First))
                return GenerateCombinations(used, listIndex + 1);

            combos = new List<string>();

            foreach (var item in _allLists[listIndex])
            {
                var newcombos = new List<string>();



                string newUsed = Combine(used, item);
                newcombos.AddRange(GenerateCombinations(newUsed, listIndex + 1));

                if (!used.Contains(item.Second) && !used.Contains(item.First))
                {
                    if (newcombos.Count == 0)
                    {
                        newcombos.Add(item.ToString());
                    }
                    else
                    {
                        for (int i = 0; i < newcombos.Count; i++)
                        {
                            newcombos[i] = item + " " + newcombos[i];
                        }
                    }
                }

                combos.AddRange(newcombos);
            }
        }

        Memorize(used, combos);
        return combos;
    }

    private static string Combine(string used, Item item)
    {
        if (!used.Contains(item.First))
            used += item.First;
        if (!used.Contains(item.Second))
            used += item.Second;

        return used;
    }        
}

public class Item
{
    public char First { get; set; }
    public char Second { get; set; }

    public Item(char first, char second)
    {
        First = first;
        Second = second;
    }
    public string DistinctCharacters()
    {
        return First == Second ? First.ToString() : this.ToString();
    }

    public override string ToString()
    {
        return First.ToString() + Second;
    }
}
chickenpie
  • 143
  • 4
  • Thanks for the response. I haven't had time to dive into your proposed solution but wanted to, as a FYI, share a note that a fellow developer thought might be useful to others: You could also use numbers instead of letters for the input (e.g., 00, 01, 02, 03 instead of AA, AB, AC, AD). I will be taking a look at your proposed solution shortly! – Dotarp Mar 27 '14 at 21:27
0

Does this work to give you what you want?

If I start with your completeList plus the missing backwards transitions:

var completeList = new List<Item>
{
    new Item('A', 'A'),
    new Item('A', 'B'),
    new Item('A', 'C'),
    new Item('A', 'D'),
    new Item('B', 'B'),
    new Item('B', 'C'),
    new Item('B', 'D'),
    new Item('C', 'B'),
    new Item('C', 'C'),
    new Item('C', 'D'),
    new Item('D', 'B'),
    new Item('D', 'C'),
    new Item('D', 'D'),
};

Then I can do this:

var lookup = completeList.ToLookup(x => x.First, x => x.Second);

Func<IEnumerable<string>, IEnumerable<string>> f = null;
f = xs =>
{
    var query =
        from x in xs
        let ys = lookup[x.Last()]
            .Where(y => !x
                .Take(x.Length % 2 == 1 ? x.Length - 1 : x.Length)
                .Contains(y))
            .Select(y => x + y)
            .ToArray()
        group new { x, ys } by ys.Any();

    return query
        .Where(c => c.Key == false)
        .SelectMany(qs => qs.Select(q => q.x))
        .Concat(query
            .Where(c => c.Key == true)
            .SelectMany(ys => Generate(ys.SelectMany(y => y.ys))));
};

var results = f(new [] { "A" });

I get these results:

ABCD 
ABDC 
ACBD 
ACDB 
ADBC 
ADCB 
AABBCD 
AABBDC 
AABCDD 
AABDCC 
AACBDD 
AACCBD 
AACCDB 
AACDBB 
AADBCC 
AADCBB 
AADDBC 
AADDCB 
ABCCDD 
ABDDCC 
ACBBDD 
ACDDBB 
ADBBCC 
ADCCBB 
AABBCCDD 
AABBDDCC 
AACCBBDD 
AACCDDBB 
AADDBBCC 
AADDCCBB 
Enigmativity
  • 113,464
  • 11
  • 89
  • 172
  • Unfortunately, this breaks rule #2 which is "Every distinct letter must appear at least once..." So, for example, your first result is invalid because it doesn't contain letters B, C, and D. – Dotarp Mar 27 '14 at 23:42
  • @Dotarp - Thanks for the pick up on that. I've fixed the algorithm. It's actually shorter now. – Enigmativity Mar 28 '14 at 00:09
  • You're getting closer! If you check my "Desired Output" section, you will find the entire result the solution should provide. In your results, for example, you're missing [AC BD] and [AD BC]. – Dotarp Mar 28 '14 at 02:02
  • @Dotarp - If the `completeList` variable forms a mapping then it doesn't appear that I can get "C" followed by "B" or anything after "D". Can you please clarify? – Enigmativity Mar 28 '14 at 03:28
  • Correct. This is also one method of improving performance by ensuring that no 'B' follows a 'C' and nothing after 'D' (aka the last `N`). If you look at my **Desired Output** you will notice that if follows this exact pattern you've described. Please let me know if you need more information/clarification. – Dotarp Mar 28 '14 at 03:45
  • @Dotarp - So in that case `AC BD` and `AD BC` are not possible output? – Enigmativity Mar 28 '14 at 04:38
  • Sorry, I misunderstood your original question and replied to it with a false confirmation. I apologize for the confusion I have created. It **IS** possible to have a 'B' after a 'C' and a 'C' after a 'D' as far as the output is concerned. As illustrated in the **Desired Output** those are both valid outputs. They are located on lines 8 and 10 respectively. For some reason I thought you were asking about potential short-circuit evaluation when using grouped input. Again, my apologies. – Dotarp Mar 28 '14 at 05:18
  • @Dotarp - I'm still confused about the fact that `completeList` shows that `AC BD` and `AD BC` would not be possible even though the desired output shows it is. Should I ignore the `completeList`? – Enigmativity Mar 28 '14 at 06:01
  • I'm not sure why you feel that `completeList` shows that `AC BD` and `AD BC` are not possible. `AC` is at index 2, `BD` is at 6. `AD` is at index 3 and `BC` is at index 5. The indexes are not important, I am just using them to show you that the data exists in the `completeList` and, through combining, are valid options. – Dotarp Mar 28 '14 at 18:30
  • @Dotarp - I have taken it that you can travel from first to second on the `completeList`. So to get `ACBD` you have `A` -> `C` -> `B` -> `D`. The crucial one is `C` -> `B`. That doesn't exist. Also `D` -> `B` doesn't exist. No wonder you didn't get many answers - your question is quite ambiguous. – Enigmativity Mar 28 '14 at 22:52
  • I feel the question is quite clear. The input sets are not individual letters but **two** letters (e.g., `BD`). You are able to skip a set if that letter has been satisfied (see rules for more details). Therefore you **can** get `AC BD` by using `AC` at index 2 and `BD` at index 6. You skip any that start with `C*` after you have `AC` because it satisfies your requirement of `A` and `C`. This is why you have both `AC BB DD` and `AC BD` as valid outputs. – Dotarp Mar 29 '14 at 00:48
  • @Dotarp I realize that they are two letters. They appear to be a rule which maps from one letter to the next that is legal. But your latest comment makes this even more unclear to me. I'm glad the question is clear to you. It's becoming less clear as we go. I'm afraid I give up. – Enigmativity Mar 29 '14 at 01:09
  • Not sure what else I could say; I have described the input, the rules, the output, etc. I'm sorry we are having trouble communicating. I appreciate you taking the time to look at my question. – Dotarp Mar 30 '14 at 08:10