How many sub-sequences of unique elements can be possible?

Question

I'v a sequence of integer number [A1, A2, A3.....AN] I'm trying to count all sub-sequences which contain at most K unique numbers.

For Example:

Given sequence:: [2, 3, 3, 7, 5] and K = 3

All sub-sequences are:

[],
[2],[3],[3],[7],[5],
[2, 3],[2, 3],[2, 7],[2, 5],[3, 3],[3, 7],[3, 5],[3, 7],[3, 5],[7, 5],
[2, 3, 3],[2, 3, 7],[2, 3, 5],[2, 3, 7],[2, 3, 5],[2, 7, 5],[3, 3, 7],[3, 3, 5],[3, 7, 5],[3, 7, 5],
[2, 3, 3, 7],[2, 3, 3, 5],[2, 3, 7, 5],[2, 3, 7, 5],[3, 3, 7, 5],
[2, 3, 3, 7, 5]

I need all sub-sequences (only for counting) that have unique elements.

Counted sub-sequences are:

[],
[2],[3],[3],[7],[5],
[2, 3],[2, 3],[2, 7],[2, 5],[3, 7],[3, 5],[3, 7],[3, 5],[7, 5],
[2, 3, 7],[2, 3, 5],[2, 3, 7],[2, 3, 5],[2, 7, 5],[3, 7, 5],[3, 7, 5]

Total = 22

Not counted sub-sequences are:

[3, 3],
[2, 3, 3],[3, 3, 7],[3, 3, 5]

Ignored for higher length: length > K (If k = 5 then ignored for duplicate elements):

[2, 3, 3, 7],[2, 3, 3, 5],[3, 3, 7, 5],
[2, 3, 3, 7, 5]

[ N.B: All sub-sequences not be unique, but their elements (number) should be identical. Ex- [2, 3],[2, 3] both are counted but [3, 3] ignored ]

::CODE::

import itertools as it
import sys, math

n, m = map(int, sys.stdin.readline().split())
a = list(map(int, sys.stdin.readline().split()))

cnt = 0
for i in range(2, m+1): 
  for val in it.combinations(a, i):
    if len(set(val)) == i:
      cnt += 1
print(1+n+cnt)

Input:

5 3
2 3 3 7 5

Output: 22

BUT I need a efficient solution, maybe mathematical solution using nCretc or programmatic solution.

Constraints:

1 <= K <= N <= 1,000,00
1 <= Ai <= 9,000

Time: 1 sec

"gimme some code" is not a question and i didnt see any other relevance of the c++ tag, so i removed it — 463035818_is_not_an_ai, Sep 09 '19 at 14:42
I would transform `[2, 3, 3, 7, 5]` into `{2:1, 3:2, 7:1, 5:1}`, then for each combination with keys, you multiply with corresponding value. — Jarod42, Sep 09 '19 at 14:44
@CeliusStingher : emojis are not for influencing but for visually describing. — Afrin, Sep 09 '19 at 14:56
The amount of subsequences (with no repetition) of size `k` from a sequence of size `n` is `n * (n-1) *...* (n-k+1) / k!` (`n! / k! (n-k)!`). May it can help. — dcg, Sep 09 '19 at 15:00
With your current solution, your construct `[2, 3],[2, 3]`, whereas I propose to construct `[2, 3]` only once, but count it with the weight of the dictionary. — Jarod42, Sep 09 '19 at 15:01
@dcg It's count all distinct sub-sequences which contains duplicate element. But I have to ignore some sub-sequences which contains duplicate items. Anyway, how can i count all ignored sub-sequences?? — Afrin, Sep 09 '19 at 15:21
@AfrinMouDia you could get how many elements are distinct with `len(set(list))` — dcg, Sep 09 '19 at 15:25
@Jarod42 Umm, but another problems, how could I track/count sub-sequences which have that elements, Ex- How many sub-sequences ```2``` have (so that I can multiply it with the dict value)?? — Afrin, Sep 09 '19 at 15:26
@dcg if so then **TLE**, Do you mean another way? if so, please explain more. — Afrin, Sep 09 '19 at 15:28
@AfrinMouDia: `cnt += 1` becomes `cnt += functools.reduce(lambda acc, k: acc * val[k], val, 1)`. — Jarod42, Sep 09 '19 at 15:52
@Jarod42 Please check out the **Constraints** If i need to generate sub-sequences then it can't be done within `1 seconds` — Afrin, Sep 09 '19 at 16:01
@AfrinMouDia Is it mandatory that the sub-sequences have to be obtain from the inputted sequence, i.e., you cannot remove repeated elements before computation? If so, I don't see a way of getting what you want if it isn't by construction (as you're doing). I gave you the formula to get the number of sub-sequences of size `k` from a sequence of size `n` (which is the number of combinations you get from `it.combinations(a, k)`). — dcg, Sep 09 '19 at 17:50
@dcg Yes, Its mandatory that the sub-sequences have to be obtain from the inputted sequence. — Afrin, Sep 09 '19 at 18:37

score 1 · Accepted Answer · answered Sep 16 '19 at 13:21

Try this:

import numpy as np
mod = 1000000007
n, k = map(int, input().split())
a = list(map(int, input().split()))
fre = [0]*10000
A = []
for i in range(0, n):
    fre[a[i]] += 1
for i in range(0, 9001):
    if fre[i] > 0:
        A.append(fre[i])   
kk = min( len( A ), k ) + 1
S = np.zeros( kk, dtype=int );   S[0] = 1
for a in A:
   S[1:kk] = (S[1:kk] + (a * S[0:kk-1])% mod) % mod
ans = 0
for s in S:
    ans = ((ans + s) % mod)
print(ans)

This program return all sub-sequences (only for count) that have unique elements.

Shahab Rahnama · Answer 2 · 2019-09-09T16:51:53.237

0

try this:

import itertools 

def findsubsets(s, n): 
    return list(itertools.combinations(s, n))

my_list = [2, 3, 3, 7, 5]
list_len = 0

for i in range(1,len(my_list)):
    list_len += len(set(findsubsets(my_list, i)))

print(list_len)

output:

edit: remove permutations with same numbers from list:

import itertools 

def findsubsets(s, n): 
    return list(list(x) for x in itertools.combinations(s, n))

my_list = [2, 3, 7, 5, 3]
list_len = 0

for i in range(1,len(my_list)):
    list_len += len(set(tuple(sorted(i)) for i in findsubsets(my_list, i)))

print(list_len)

output:

edited Sep 09 '19 at 16:51

answered Sep 09 '19 at 15:20

Shahab Rahnama

982
1
7
14

Efficiency does not changed. Also, for the input ```2 3 7 5 3``` The output ```28``` – Afrin Sep 09 '19 at 15:40
@AfrinMouDia you right! now I remove permutations with same numbers! ;) – Shahab Rahnama Sep 09 '19 at 16:30
Its not really what i want. I need to remove itertools.combinations(), otherwise **TLE**. Please check out the constraints and time limit. – Afrin Sep 09 '19 at 17:32
@AfrinMouDia hmmm ... look at this https://stackoverflow.com/questions/53419536/what-is-the-computational-complexity-of-itertools-combinations-in-python – Shahab Rahnama Sep 09 '19 at 18:02
Yes, that's why i need to compute(count) the number without generating sub-sequences. – Afrin Sep 09 '19 at 18:39