0

I hope I can explain my problem in an understandable manner.

  • I've an ID range from 1 to 9999
  • I also have a table/list of arrays (~ 5 million rows) with each row containing a variable number of IDs (1 to 9999)
  • no ID can be added twice

What I want to achieve is the following:

  • analyze the list and find a "common" set/bundle of IDs
  • the bundle should cover the highest possible amount of entries
  • then I could add this bundle ID for each approbriate row and delete all IDs which the bundle contains
  • it's basically a consolidation ? can I call it like this ?

What I've came up so far is:

  • I need to create some sort of bundle-patterns before (with the given ID range)
  • then search the list and check which bundle-pattern has the most matches

Ok, so to create bundle-patterns I guess I need some restrictions/requirements. ID order doesn't matter. A fixed number of entries for all bundle-patterns.

I think I've found how to calculate the total possible combinations: https://en.m.wikipedia.org/wiki/Lottery_mathematics

Based on this, I get some absurd big numbers:

429697070775296968267698969897035480966209451962516119751112050185196367497843244893328971381665723816568456409339133626915347923324559613616988661591000052766331813232549687615104140005914066483741393662638605162001

Even if I limit the number of entries for the bundle-patterns, to something like 10, I get billions of combinations.
I've no clue if I'm on the right track or if this "consolidation" process is reasonable.

Thanks a lot for any feedback and ideas!

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Janitor
  • 1
  • 1
  • `each row containing a variable number of IDs` It doesn't sound like an RDBMS is going to be the right tool for this. Look at whatever people use for DNA analysis, or somesuch. – Strawberry Dec 06 '19 at 10:06
  • In the DB it's actually another table with one BIGINT and one INT column. DNA analysis? Nice idea. Never thought about that. – Janitor Dec 06 '19 at 10:20

0 Answers0