I have some input data like this.
unique ID | Q1 | Q2 | Q3 |
---|---|---|---|
1 | 1 | 1 | 2 |
2 | 1 | 1 | 2 |
3 | 1 | 0 | 3 |
4 | 2 | 0 | 1 |
5 | 3 | 1 | 2 |
6 | 4 | 1 | 3 |
And my target is to extract some data which satisfy the following conditions:
- total count: 4
Q1=1
count: 2Q1=2
count: 1Q2=1
count: 1~3Q3=1
count: 1
In this case, both data set with ids [1, 2, 4, 5] or [2, 3, 4, 5] are acceptable answers.
In reality, I will possibly have 6000+ rows of data and up to 12 count limitation like above. The count might varies from 1 to 50. I've written a solution which firstly group all ids by each condition, then use deapth first search to exhaustedly try out all possible combinations between the groups. (I believe this is a brute-force solution...) However, I always run out my computer's memory and my time before I can get a possible answer.
My question is,
- what's the possible least time complexity of this problem. (I believe this is kind of subset sum problem, but I am not sure)
- how can I solve this problem instead of a brute-force one? I'm considering dynamic programming or decision tree. However, I believe that I will possibly run out of my computer's memory with either of this one. Or can I solve this problem by each data row's probabilities/entropy (and I would appreciate more details on this)?
My brute-force solution sample codes are not worth reading at all. Thus, I'll skip posting my code snippets...