2

I'm trying to figure out how to calculate the number of all strings of length n such that any substring of length 4 of string w, all three letters a, b, c occur. For example, abbcaabca should be printed when n = 9, but aabbcabac should not be included.

I was trying to make a math formula like

3^N - 3 * 2^N + 3 or (3^(N-3))*N!

Can it work this way or do I have to generate them and count them? I'm working with large numbers like 100, and I don't think I can generate them to count them.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
brynello
  • 21
  • 3

2 Answers2

1

You should probably be able to work your way up and start with let's say all possible words of length 4 and then add just one letter and count the possible allowed resulting words. Then you can iteratively go up to high numbers without having to explore all 3^N possibilities.

const unsigned w = 4;
unsigned n = 10;

vector<string> before,current;

// obtain all possible permutations of the strings "aabc", "abbc" and "abcc"
string base = "aabc";
before.emplace_back(base);
while(std::next_permutation(base.begin(),base.end())) before.emplace_back(base);
base = "abbc";
before.emplace_back(base);
while(std::next_permutation(base.begin(),base.end())) before.emplace_back(base);
base = "abcc";
before.emplace_back(base);
while(std::next_permutation(base.begin(),base.end())) before.emplace_back(base);

// iteratively add single letters to the words in the collection and add if it is a valid word
size_t posa,posb,posc;
for (unsigned k=1;k<n-w;++k)
{
    current.clear();
    for (const auto& it : before)
    {
        posa = it.find("a",k);
        posb = it.find("b",k);
        posc = it.find("c",k);
        if (posb!= string::npos && posc!= string::npos) current.emplace_back(it+"a");
        if (posa!= string::npos && posc!= string::npos) current.emplace_back(it+"b");
        if (posa!= string::npos && posb!= string::npos) current.emplace_back(it+"c");
    }
    before = current;
}
for (const auto& it : current) cout<<it<<endl;
cout<<current.size()<<" valid words of length "<<n<<endl;

Note that with this you will still however run into the exponential wall pretty quickly... In a more efficient implementation I would represent words as integers (NOT vectors of integers, but rather integers in a base 3 representation), but the exponential scaling would still be there. If you are just interested in the number, @Jeffrey's approach is surely better.

Darkdragon84
  • 539
  • 5
  • 13
  • If you have a string of length *k-1*, you can generate the corresponding strings of length *k*. If the last 3 characters contain all of a,b,c then you can append either a, b, or c; if not, you append only the character 4 from the end. This isn't going to grow too fast. There are 27 combinations of three characters, but 'aaa', 'bbb', and 'ccc' can't appear so that is 24 possibilities. Of those only six allow three choices. I think that gives an average growth factor of 1.31 (and the hundredth power of that is about 1E12). There are only 36 strings of length 4 to start with. – Martin Bonner supports Monica Mar 24 '16 at 14:41
  • I think there should be 72 strings of length 4 to start with: 3*4! = 3*4*3*2 = 8*9 = 72. – Darkdragon84 Mar 24 '16 at 15:03
  • Why '3*4!'? My initial thought was "six ways of ordering 'abc'" and "four locations to add any one of three characters" => 6 * 4 * 3 (=72). But that involves lots of double counting => eg aabc could be 'Xabc' or 'aXbc' where X is the additional character we add to bring the count up to four. I eventually generated all 3^4 strings in Excel and counted those which contained all three characters. – Martin Bonner supports Monica Mar 24 '16 at 16:45
  • Indeed, we should look at words of the kind 'Xabc'. Let's look at X=a first. We then need all possible permutations of 'aabc', and there are indeed not 4! = 24, but rather 4!/2 = 12 of them, since two of the elements are the same (so we have to divide by 2). Next we look at X=b and X=c, so all possible permutations of 'abbc' and 'abcc'. – Darkdragon84 Mar 25 '16 at 00:54
  • These are then 3 distinct sets, where no element of one set can show up in another, and in total we get 3*4!/2 = 36 allowed words of length 4, you are right. Sorry, I overlooked the fact that you have to divide by 2 since the ordering of the two equal letters doesn't matter :-) My above program yields the correct initial set though. – Darkdragon84 Mar 25 '16 at 00:54
0

The trick is to break down the problem. Consider:

Would knowing how many such strings, of length 50, ending in each pair of letter, help ?

Number of 50-string, ending in AA times Number of 50-string, starting with B or C + Number of 50-string, ending in AB times Number of 50-string, starting with C + All other combinations gives you the number of 100-long strings.

Continue breaking it down, recursively.

Look up dynamic programming.

Also look up large number libraries.

Jeffrey
  • 11,063
  • 1
  • 21
  • 42
  • I presume you would still have to start with some finite initial value and then work your way up. You will then get a sequence x[n+k] = \sum_{l=0...k-1} a_l x[n+l]. You can then look at the limit x{n+1]/x[n] to get the scaling exponent for n large. – Darkdragon84 Mar 25 '16 at 01:02