0

I am trying to find all the possible unique combinations out of n elements, taken m at a time. I have used itertools.combinations for the same and I have n=85. So when I'm finding combinations for m=5, the number of combinations produced are about 3 cr and it is taking a lot of time, since as of now, the elements are a list of strings, or more precisely, they're columns in the alphabetical fashion, not numerical indices. I am currently working with pandas and itertools.combinations, would like to know if the process of finding combinations could be parallelised, to give same results every time upon further calculations that I perform on the columns further, or whether GPU dataframes, like cuDF might optimise this, although it doesn't look like it. Also, could converting the column names into numbers and then converting it into a numpy array to work on while finding combinations work faster? Please also suggest solutions where this could be done faster in some other programming language as well. Not a very good programmer. Would love to see some mathematical and programmatic solutions with a complexity analysis.

1 Answers1

1

This is exactly a complexity analysis problem, and there's no way to parallelize it in a way that will be satisfying. With n=85 and m=5, there are 85^5 = 4437053125 possible combinations, including reversals and duplicates.

The fastest way that I know of using a GPU to explore this space is with cuGraph. Exploring all 4437053125 combinations is simply a breadth first search, though even with a GPU I expect it to take a very long time.

Artificial Intelligence is the study of methods of finding useful solutions inside of problem spaces that are too big to fully explore. A* or greedy search could give you a good solution, quickly, assuming that there is some metric that you are trying to optimize among the 85^5 total combinations.

Thomson Comer
  • 3,919
  • 3
  • 30
  • 32