Find common ID set to consolidate Table of IDs

Question

I hope I can explain my problem in an understandable manner.

I've an ID range from 1 to 9999
I also have a table/list of arrays (~ 5 million rows) with each row containing a variable number of IDs (1 to 9999)
no ID can be added twice

What I want to achieve is the following:

analyze the list and find a "common" set/bundle of IDs
the bundle should cover the highest possible amount of entries
then I could add this bundle ID for each approbriate row and delete all IDs which the bundle contains
it's basically a consolidation ? can I call it like this ?

What I've came up so far is:

I need to create some sort of bundle-patterns before (with the given ID range)
then search the list and check which bundle-pattern has the most matches

Ok, so to create bundle-patterns I guess I need some restrictions/requirements. ID order doesn't matter. A fixed number of entries for all bundle-patterns.

I think I've found how to calculate the total possible combinations: https://en.m.wikipedia.org/wiki/Lottery_mathematics

Based on this, I get some absurd big numbers:

429697070775296968267698969897035480966209451962516119751112050185196367497843244893328971381665723816568456409339133626915347923324559613616988661591000052766331813232549687615104140005914066483741393662638605162001

Even if I limit the number of entries for the bundle-patterns, to something like 10, I get billions of combinations.
I've no clue if I'm on the right track or if this "consolidation" process is reasonable.

Thanks a lot for any feedback and ideas!

`each row containing a variable number of IDs` It doesn't sound like an RDBMS is going to be the right tool for this. Look at whatever people use for DNA analysis, or somesuch. — Strawberry, Dec 06 '19 at 10:06
In the DB it's actually another table with one BIGINT and one INT column. DNA analysis? Nice idea. Never thought about that. — Janitor, Dec 06 '19 at 10:20

Find common ID set to consolidate Table of IDs

0 Answers0