0

I have a project where I have a large C(100,20) number of combinations with minor work being done for each combination set.

I am using Spark .NET with visual studio as my technology (see setup below): https://learn.microsoft.com/en-us/dotnet/spark/tutorials/get-started

Spark .NET has a dataframe with SQL type commands. I am assuming I need to do a SQL type command to create the N choose K combinations with a user defined worker function to process the combinations.

The question is what does the code look like using Spark .NET with a DataFrame? If a DataFrame doesn't support an N choose K option, are there other options to keep the generation of the combinations distributed?

CHEEKATLAPRADEEP
  • 12,191
  • 1
  • 19
  • 42
CPGAdmin
  • 29
  • 5
  • Wow, that's `535,983,370,403,809,682,970` combinations... You might need a quantum computer... – Enigmativity Aug 13 '20 at 05:46
  • Maybe it's a stretch goal, but for now I am trying to figure out Spark .NET N choose K code that would be properly distributed. – CPGAdmin Aug 13 '20 at 16:58
  • You do know that if you could process a single combination in 1 millisecond then to process all the `C(100,20)` combinations would take nearly 17 billion years! – Enigmativity Aug 14 '20 at 05:48
  • @Enigmativity, your comments are not helping to solve Spark .NET N choose K code solution which is the main question. This deployment would be pushed out to Azure Databricks for massively distributed computing. The stretch goal is designed to evaluate costs and performance using more reasonable sets. I do have a client case which is C(500,20) if you want an even bigger number. Clients don't care about how big a number is, they only want to know if they can afford it :) – CPGAdmin Aug 14 '20 at 14:38
  • Now it's insane. You're at over 8,400 trillion trillion years. I thought you might have thrown the first `C(100,20)` in accidentally without computing the number. But to say that there is an actually client requirement for `C(500,20)` is ridiculous. You can't possibly try to get this to work for those kind of numbers. If you can make the problem a reasonable one to solve then you're more likely to get answers. – Enigmativity Aug 15 '20 at 00:00
  • I appreciate that you feel that I didn't try to solve the core issue in your question. I didn't. Here's the best help that I can give you: please read [ask]. A well asked question on this site gets answers within minutes. – Enigmativity Aug 15 '20 at 00:01

1 Answers1

1

My basic question was answered from the spark dotnet github area

https://github.com/dotnet/spark/issues/627

By using a cross join on two dataframes, I was able to create the combinations. This may not be the best way, and perhaps others will follow up with a better solution.

For N Choose K that would be K crossjoins using the N set.

CPGAdmin
  • 29
  • 5