How to find smallest substring which contains all characters from a given string?

Question

I have recently come across an interesting question on strings. Suppose you are given following:

Input string1: "this is a test string"
Input string2: "tist"
Output string: "t stri"

So, given above, how can I approach towards finding smallest substring of string1 that contains all the characters from string 2?

Should string2 be rist or tisr? And in that case wouldn't the output be "st str"? — Dark Castle, Mar 17 '10 at 03:07
@kennygrimm, string2 is given as "tist" and that it should be. If you say "rist" or "tisr" than your answer "st str" doesn't contain "i". — Rajendra Uppal, Mar 17 '10 at 03:13
Oh I see, I thought that the 'r' was wrong since it wasn't in string2 but you're saying it must contain all of string2 but could also contain other letters... — Dark Castle, Mar 17 '10 at 03:18
do duplicates in `string2` need to be accounted for as well? cause otherwise the shortest substring having `tist` in `string1` is `this` or `stri` — Anurag, Mar 17 '10 at 06:14

1337c0d3r · Answer 1 · 2013-12-26T11:11:20.123

To see more details including working code, check my blog post at:

http://www.leetcode.com/2010/11/finding-minimum-window-in-s-which.html

To help illustrate this approach, I use an example: string1 = "acbbaca" and string2 = "aba". Here, we also use the term "window", which means a contiguous block of characters from string1 (could be interchanged with the term substring).

alt text

i) string1 = "acbbaca" and string2 = "aba".

alt text

ii) The first minimum window is found. Notice that we cannot advance begin pointer as hasFound['a'] == needToFind['a'] == 2. Advancing would mean breaking the constraint.

alt text

iii) The second window is found. begin pointer still points to the first element 'a'. hasFound['a'] (3) is greater than needToFind['a'] (2). We decrement hasFound['a'] by one and advance begin pointer to the right.

alt text

iv) We skip 'c' since it is not found in string2. Begin pointer now points to 'b'. hasFound['b'] (2) is greater than needToFind['b'] (1). We decrement hasFound['b'] by one and advance begin pointer to the right.

alt text

v) Begin pointer now points to the next 'b'. hasFound['b'] (1) is equal to needToFind['b'] (1). We stop immediately and this is our newly found minimum window.

The idea is mainly based on the help of two pointers (begin and end position of the window) and two tables (needToFind and hasFound) while traversing string1. needToFind stores the total count of a character in string2 and hasFound stores the total count of a character met so far. We also use a count variable to store the total characters in string2 that's met so far (not counting characters where hasFound[x] exceeds needToFind[x]). When count equals string2's length, we know a valid window is found.

Each time we advance the end pointer (pointing to an element x), we increment hasFound[x] by one. We also increment count by one if hasFound[x] is less than or equal to needToFind[x]. Why? When the constraint is met (that is, count equals to string2's size), we immediately advance begin pointer as far right as possible while maintaining the constraint.

How do we check if it is maintaining the constraint? Assume that begin points to an element x, we check if hasFound[x] is greater than needToFind[x]. If it is, we can decrement hasFound[x] by one and advancing begin pointer without breaking the constraint. On the other hand, if it is not, we stop immediately as advancing begin pointer breaks the window constraint.

Finally, we check if the minimum window length is less than the current minimum. Update the current minimum if a new minimum is found.

Essentially, the algorithm finds the first window that satisfies the constraint, then continue maintaining the constraint throughout.

I think the approach needs a more clean explanation. Especially the terms like 'hasFound' and 'needToFind'. It's hard to wrap my head around it. — आनंद, Dec 11 '18 at 16:39
`needToFind` is the histogram computed from the pattern string, `string2`. It is computed in the beginning and it never changes. In this example, `needToFind = {'a' : 2, 'b': 1}`. On the other hand, `hasFound` is the histogram of the characters currently in the sliding window. — Arun, Sep 19 '20 at 14:32
link is broken and I can't find the problem on lc. Any chance you have the problem link on lc? — Syed Ali, May 02 '23 at 18:50

Rex Kerr · Accepted Answer · 2010-03-17T03:50:57.187

37

You can do a histogram sweep in O(N+M) time and O(1) space where N is the number of characters in the first string and M is the number of characters in the second.

It works like this:

Make a histogram of the second string's characters (key operation is hist2[ s2[i] ]++).
Make a cumulative histogram of the first string's characters until that histogram contains every character that the second string's histogram contains (which I will call "the histogram condition").
Then move forwards on the first string, subtracting from the histogram, until it fails to meet the histogram condition. Mark that bit of the first string (before the final move) as your tentative substring.
Move the front of the substring forwards again until you meet the histogram condition again. Move the end forwards until it fails again. If this is a shorter substring than the first, mark that as your tentative substring.
Repeat until you've passed through the entire first string.
The marked substring is your answer.

Note that by varying the check you use on the histogram condition, you can choose either to have the same set of characters as the second string, or at least as many characters of each type. (Its just the difference between a[i]>0 && b[i]>0 and a[i]>=b[i].)

You can speed up the histogram checks if you keep a track of which condition is not satisfied when you're trying to satisfy it, and checking only the thing that you decrement when you're trying to break it. (On the initial buildup, you count how many items you've satisfied, and increment that count every time you add a new character that takes the condition from false to true.)

edited Mar 17 '10 at 03:50

answered Mar 17 '10 at 03:26

Rex Kerr

166,841
26
322
407

+1: This is much more readable than python. Would be nice if you included a proof/explanation of why it works too. – Mar 17 '10 at 14:43
@Rex Kerr: I fail to see how that is O(1) space. Will your histograms not take up O(N+M) space if all character are unique (worst case)? – Mads Ravn Mar 17 '10 at 15:59
1

O(M) space can be done (rather than O(N+M)), as you don't really need to concern yourself with characters not present is s2. I agree though, that the space usage is O(1) seems incorrect and does not seem to match the description. – Mar 17 '10 at 17:16
@Rex: I think we all know what O(1) means. You are missing the point, and the proof request was not about why it is O(N), it was about why it is _correct_. If you like, I can add that to your post. – Mar 18 '10 at 02:04
1

@Moron: You are completely right. I was just following the steps of the algorithm and did not take this (simple) optimization into account. @Rex Kerr: From a theoretical standpoint I believe you are wrong. If you allocate a constant amount of memory I can chose M large enough that your counters overflow, so we would need at least O(log_2(M)) space. In a more pragmatic use of the notation I would also consider O(1) to be a bit misleading as one usually associates this with a small fixed amount of memory. Can we settle for O(min(charset size, M)) :) – Mads Ravn Mar 18 '10 at 11:45
@Moron: Why not add your own answer about why it is/isn't correct, since both algorithmist and I have the same answer (as does the solution at the end of nvl's link)? @Mads: Good point--let's say `O(|set(M)|)` perhaps, where `|set(M)|` is the number of unique characters in `M`. – Rex Kerr Mar 18 '10 at 15:08
@Rex. I am not claiming it is incorrect. I gave it a +1 already (i.e. I think it is correct). I really don't see the point of having multiple answers saying the same thing in different ways or one answer supplementing the other. This site is meant to answer questions, not to drown the questioner with a lot of varied answers saying similar things. I suggested you add a proof to make your answer more complete (and better), not to question the correctness. You don't seem to get the point of this site... Anyway, I am done with this conversation. – Mar 19 '10 at 02:19

score 6 · Answer 3 · answered Mar 17 '10 at 03:25

6

Here's an O(n) solution. The basic idea is simple: for each starting index, find the least ending index such that the substring contains all of the necessary letters. The trick is that the least ending index increases over the course of the function, so with a little data structure support, we consider each character at most twice.

In Python:

from collections import defaultdict

def smallest(s1, s2):
    assert s2 != ''
    d = defaultdict(int)
    nneg = [0]  # number of negative entries in d
    def incr(c):
        d[c] += 1
        if d[c] == 0:
            nneg[0] -= 1
    def decr(c):
        if d[c] == 0:
            nneg[0] += 1
        d[c] -= 1
    for c in s2:
        decr(c)
    minlen = len(s1) + 1
    j = 0
    for i in xrange(len(s1)):
        while nneg[0] > 0:
            if j >= len(s1):
                return minlen
            incr(s1[j])
            j += 1
        minlen = min(minlen, j - i)
        decr(s1[i])
    return minlen

answered Mar 17 '10 at 03:25

user287792

1,581
9
12

@algorithmist, I have not worked in Python, but I can get that for...while loop doesn;t seem to O(n). Can you please tell your approach taking example given in the question, would appreciate that. – Rajendra Uppal Mar 17 '10 at 03:29
1

j can only increase len(s1) times, so the while loop does O(n) work in total. – user287792 Mar 17 '10 at 03:31
@Rajendra: This algorithm does exactly what I described in my post, if that helps--`i` marks the tail of the substring and `j` marks the head. @algorithmist: nice work, coming up with code ever-so-slightly-faster than I came up with a description! – Rex Kerr Mar 17 '10 at 03:37
This is NOT O(n) solution! Because looking in the dictionary itself has the worst case complexity O(n) https://en.wikipedia.org/wiki/Best,_worst_and_average_case#Data_structures so multiple your n at least by 2 – Cmyker Sep 21 '15 at 19:01

jackdaniel · Answer 4 · 2016-05-23T12:51:27.440

I received the same interview question. I am a C++ candidate but I was in a position to code relatively fast in JAVA.

Java [Courtesy : Sumod Mathilakath]

import java.io.*;
import  java.util.*;

class UserMainCode
{


    public String GetSubString(String input1,String input2){
        // Write code here...
        return find(input1, input2);
    }
  private static boolean containsPatternChar(int[] sCount, int[] pCount) {
        for(int i=0;i<256;i++) {
            if(pCount[i]>sCount[i])
                return false;
        }
        return true;
    }
  public static String find(String s, String p) {
        if (p.length() > s.length())
            return null;
        int[] pCount = new int[256];
        int[] sCount = new int[256];
        // Time: O(p.lenght)
        for(int i=0;i<p.length();i++) {
            pCount[(int)(p.charAt(i))]++;
            sCount[(int)(s.charAt(i))]++;
        }
        int i = 0, j = p.length(), min = Integer.MAX_VALUE;
        String res = null;
        // Time: O(s.lenght)
        while (j < s.length()) {
            if (containsPatternChar(sCount, pCount)) {
                if ((j - i) < min) {
                    min = j - i;
                    res = s.substring(i, j);
                    // This is the smallest possible substring.
                    if(min==p.length())
                        break;
                    // Reduce the window size.
                    sCount[(int)(s.charAt(i))]--;
                    i++;
                }
            } else {
                sCount[(int)(s.charAt(j))]++;
                // Increase the window size.
                j++;
            }
        }
        System.out.println(res);
        return res;
    }
}

C++ [Courtesy : sundeepblue]

#include <iostream>
#include <vector>
#include <string>
#include <climits>
using namespace std;
string find_minimum_window(string s, string t) {
    if(s.empty() || t.empty()) return;

    int ns = s.size(), nt = t.size();
    vector<int> total(256, 0);
    vector<int> sofar(256, 0);
    for(int i=0; i<nt; i++) 
        total[t[i]]++;

    int L = 0, R; 
    int minL = 0;                           //gist2
    int count = 0;
    int min_win_len = INT_MAX;

    for(R=0; R<ns; R++) {                   // gist0, a big for loop
        if(total[s[R]] == 0) continue;
        else sofar[s[R]]++;

        if(sofar[s[R]] <= total[s[R]])      // gist1, <= not <
            count++;

        if(count == nt) {                   // POS1
            while(true) {
                char c = s[L]; 
                if(total[c] == 0) { L++; }
                else if(sofar[c] > total[c]) {
                    sofar[c]--;
                    L++;
                }
                else break;
            }  
            if(R - L + 1 < min_win_len) {   // this judge should be inside POS1
                min_win_len = R - L + 1;
                minL = L;
            }
        }
    }
    string res;
    if(count == nt)                         // gist3, cannot forget this. 
        res = s.substr(minL, min_win_len);  // gist4, start from "minL" not "L"
    return res;
}
int main() {
    string s = "abdccdedca";
    cout << find_minimum_window(s, "acd");
}

Erlang [Courtesy : wardbekker]

-module(leetcode).

-export([min_window/0]).

%% Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

%% For example,
%% S = "ADOBECODEBANC"
%% T = "ABC"
%% Minimum window is "BANC".

%% Note:
%% If there is no such window in S that covers all characters in T, return the emtpy string "".
%% If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.



min_window() ->
    "eca" = min_window("cabeca", "cae"),
    "eca" = min_window("cfabeca", "cae"),
    "aec" = min_window("cabefgecdaecf", "cae"),
    "cwae" = min_window("cabwefgewcwaefcf", "cae"),
    "BANC" = min_window("ADOBECODEBANC", "ABC"),
    ok.

min_window(T, S) ->
    min_window(T, S, []).

min_window([], _T, MinWindow) ->
    MinWindow;
min_window([H | Rest], T, MinWindow) ->
    NewMinWindow = case lists:member(H, T) of
                       true ->
                           MinWindowFound = fullfill_window(Rest, lists:delete(H, T), [H]),
                           case length(MinWindow) == 0 orelse (length(MinWindow) > length(MinWindowFound)
                               andalso length(MinWindowFound) > 0) of
                               true ->
                                   MinWindowFound;
                               false ->
                                   MinWindow
                           end;
                       false ->
                           MinWindow
                   end,
    min_window(Rest, T, NewMinWindow).

fullfill_window(_, [], Acc) ->
    %% window completed
    Acc;
fullfill_window([], _T, _Acc) ->
    %% no window found
    "";
fullfill_window([H | Rest], T, Acc) ->
    %% completing window
    case lists:member(H, T) of
        true ->
            fullfill_window(Rest, lists:delete(H, T), Acc ++ [H]);
        false ->
            fullfill_window(Rest, T, Acc ++ [H])
    end.

REF:

score 1 · Answer 5 · edited Dec 20 '11 at 18:16

Please have a look at this as well:

//-----------------------------------------------------------------------

bool IsInSet(char ch, char* cSet)
{
    char* cSetptr = cSet;
    int index = 0;
    while (*(cSet+ index) != '\0')
    {
        if(ch == *(cSet+ index))
        {
            return true;            
        }
        ++index;
    }
    return false;
}

void removeChar(char ch, char* cSet)
{
    bool bShift = false;
    int index = 0;
    while (*(cSet + index) != '\0')
    {
        if( (ch == *(cSet + index)) || bShift)
        {
            *(cSet + index) = *(cSet + index + 1);
            bShift = true;
        }
        ++index;
    }
}
typedef struct subStr
{
    short iStart;
    short iEnd;
    short szStr;
}ss;

char* subStringSmallest(char* testStr, char* cSet)
{
    char* subString = NULL;
    int iSzSet = strlen(cSet) + 1;
    int iSzString = strlen(testStr)+ 1;
    char* cSetBackUp = new char[iSzSet];
    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);

    int iStartIndx = -1;    
    int iEndIndx = -1;
    int iIndexStartNext = -1;

    std::vector<ss> subStrVec;
    int index = 0;

    while( *(testStr+index) != '\0' )
    {
        if (IsInSet(*(testStr+index), cSetBackUp))
        {
            removeChar(*(testStr+index), cSetBackUp);

            if(iStartIndx < 0)
            {
                iStartIndx = index;
            }
            else if( iIndexStartNext < 0)
                iIndexStartNext = index;
            else
                ;

            if (strlen(cSetBackUp) == 0 )
            {
                iEndIndx = index;
                if( iIndexStartNext == -1)
                    break;
                else
                {
                    index = iIndexStartNext;
                    ss stemp = {iStartIndx, iEndIndx, (iEndIndx-iStartIndx + 1)};
                    subStrVec.push_back(stemp);
                    iStartIndx = iEndIndx = iIndexStartNext = -1;
                    memcpy((void*)cSetBackUp, (void*)cSet, iSzSet);
                    continue;
                }
            }
        }
        else
        {
            if (IsInSet(*(testStr+index), cSet))
            {
                if(iIndexStartNext < 0)
                    iIndexStartNext = index;
            }
        }

        ++index;
    }


    int indexSmallest = 0;
    for(int indexVec = 0; indexVec < subStrVec.size(); ++indexVec)
    {
        if(subStrVec[indexSmallest].szStr > subStrVec[indexVec].szStr)
            indexSmallest = indexVec;       
    }

    subString = new char[(subStrVec[indexSmallest].szStr) + 1];
    memcpy((void*)subString, (void*)(testStr+ subStrVec[indexSmallest].iStart), subStrVec[indexSmallest].szStr);
    memset((void*)(subString + subStrVec[indexSmallest].szStr), 0, 1);

    delete[] cSetBackUp;
    return subString;
}
//--------------------------------------------------------------------

mjv · Answer 6 · 2010-03-17T03:46:17.310

Edit: apparently there's an O(n) algorithm (cf. algorithmist's answer). Obviously this have this will beat the [naive] baseline described below!

Too bad I gotta go... I'm a bit suspicious that we can get O(n). I'll check in tomorrow to see the winner ;-) Have fun!

Tentative algorithm:
The general idea is to sequentially try and use a character from str2 found in str1 as the start of a search (in either/both directions) of all the other letters of str2. By keeping a "length of best match so far" value, we can abort searches when they exceed this. Other heuristics can probably be used to further abort suboptimal (so far) solutions. The choice of the order of the starting letters in str1 matters much; it is suggested to start with the letter(s) of str1 which have the lowest count and to try with the other letters, of an increasing count, in subsequent attempts.

  [loose pseudo-code]
  - get count for each letter/character in str1  (number of As, Bs etc.)
  - get count for each letter in str2
  - minLen = length(str1) + 1  (the +1 indicates you're not sure all chars of 
                                str2 are in str1)
  - Starting with the letter from string2 which is found the least in string1,
    look for other letters of Str2, in either direction of str1, until you've 
    found them all (or not, at which case response = impossible => done!). 
    set x = length(corresponding substring of str1).
 - if (x < minLen), 
         set minlen = x, 
         also memorize the start/len of the str1 substring.
 - continue trying with other letters of str1 (going the up the frequency
   list in str1), but abort search as soon as length(substring of strl) 
   reaches or exceed minLen.  
   We can find a few other heuristics that would allow aborting a 
   particular search, based on [pre-calculated ?] distance between a given
   letter in str1 and some (all?) of the letters in str2.
 - the overall search terminates when minLen = length(str2) or when 
   we've used all letters of str1 (which match one letter of str2)
   as a starting point for the search

score 0 · Answer 7 · answered Mar 26 '16 at 13:41

Here is Java implementation

public static String shortestSubstrContainingAllChars(String input, String target) {
    int needToFind[] = new int[256];
    int hasFound[] = new int[256];
    int totalCharCount = 0;
    String result = null;

    char[] targetCharArray = target.toCharArray();
    for (int i = 0; i < targetCharArray.length; i++) {
        needToFind[targetCharArray[i]]++;           
    }

    char[] inputCharArray = input.toCharArray();
    for (int begin = 0, end = 0; end < inputCharArray.length; end++) {

        if (needToFind[inputCharArray[end]] == 0) {
            continue;
        }

        hasFound[inputCharArray[end]]++;
        if (hasFound[inputCharArray[end]] <= needToFind[inputCharArray[end]]) {
            totalCharCount ++;
        }
        if (totalCharCount == target.length()) {
            while (needToFind[inputCharArray[begin]] == 0 
                    || hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {

                if (hasFound[inputCharArray[begin]] > needToFind[inputCharArray[begin]]) {
                    hasFound[inputCharArray[begin]]--;
                }
                begin++;
            }

            String substring = input.substring(begin, end + 1);
            if (result == null || result.length() > substring.length()) {
                result = substring;
            }
        }
    }
    return result;
}

Here is the Junit Test

@Test
public void shortestSubstringContainingAllCharsTest() {
    String result = StringUtil.shortestSubstrContainingAllChars("acbbaca", "aba");
    assertThat(result, equalTo("baca"));

    result = StringUtil.shortestSubstrContainingAllChars("acbbADOBECODEBANCaca", "ABC");
    assertThat(result, equalTo("BANC"));

    result = StringUtil.shortestSubstrContainingAllChars("this is a test string", "tist");
    assertThat(result, equalTo("t stri"));
}

Shashank · Answer 8 · 2016-08-03T07:09:24.790

//[ShortestSubstring.java][1]

public class ShortestSubstring {

    public static void main(String[] args) {
        String input1 = "My name is Fran";
        String input2 = "rim";
        System.out.println(getShortestSubstring(input1, input2));
    }

    private static String getShortestSubstring(String mainString, String toBeSearched) {

        int mainStringLength = mainString.length();
        int toBeSearchedLength = toBeSearched.length();

        if (toBeSearchedLength > mainStringLength) {
            throw new IllegalArgumentException("search string cannot be larger than main string");
        }

        for (int j = 0; j < mainStringLength; j++) {
            for (int i = 0; i <= mainStringLength - toBeSearchedLength; i++) {
                String substring = mainString.substring(i, i + toBeSearchedLength);
                if (checkIfMatchFound(substring, toBeSearched)) {
                    return substring;
                }
            }
            toBeSearchedLength++;
        }

        return null;
    }

    private static boolean checkIfMatchFound(String substring, String toBeSearched) {
        char[] charArraySubstring = substring.toCharArray();
        char[] charArrayToBeSearched = toBeSearched.toCharArray();
        int count = 0;

        for (int i = 0; i < charArraySubstring.length; i++) {
            for (int j = 0; j < charArrayToBeSearched.length; j++) {
                if (String.valueOf(charArraySubstring[i]).equalsIgnoreCase(String.valueOf(charArrayToBeSearched[j]))) {
                    count++;
                }
            }
        }
        return count == charArrayToBeSearched.length;
    }
}

Although this code may help to solve the problem, providing additional context regarding _why_ and/or _how_ it answers the question would significantly improve its long-term value. Please [edit] your answer to add some explanation. — Toby Speight, Aug 02 '16 at 12:38

score 0 · Answer 9 · answered Aug 16 '16 at 09:23

This is an approach using prime numbers to avoid one loop, and replace it with multiplications. Several other minor optimizations can be made.

Assign a unique prime number to any of the characters that you want to find, and 1 to the uninteresting characters.
Find the product of a matching string by multiplying the prime number with the number of occurrences it should have. Now this product can only be found if the same prime factors are used.
Search the string from the beginning, multiplying the respective prime number as you move into a running product.
If the number is greater than the correct sum, remove the first character and divide its prime number out of your running product.
If the number is less than the correct sum, include the next character and multiply it into your running product.
If the number is the same as the correct sum you have found a match, slide beginning and end to next character and continue searching for other matches.
Decide which of the matches is the shortest.

Gist

charcount = { 'a': 3, 'b' : 1 };
str = "kjhdfsbabasdadaaaaasdkaaajbajerhhayeom"

def find (c, s):
  Ns = len (s)

  C = list (c.keys ())
  D = list (c.values ())

  # prime numbers assigned to the first 25 chars
  prmsi = [ 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 , 97]

  # primes used in the key, all other set to 1
  prms = []
  Cord = [ord(c) - ord('a') for c in C]

  for e,p in enumerate(prmsi):
    if e in Cord:
      prms.append (p)
    else:
      prms.append (1)

  # Product of match
  T = 1
  for c,d in zip(C,D):
    p = prms[ord (c) - ord('a')]
    T *= p**d

  print ("T=", T)

  t = 1 # product of current string
  f = 0
  i = 0

  matches = []
  mi = 0
  mn = Ns
  mm = 0

  while i < Ns:
    k = prms[ord(s[i]) - ord ('a')]
    t *= k

    print ("testing:", s[f:i+1])

    if (t > T):
      # included too many chars: move start
      t /= prms[ord(s[f]) - ord('a')] # remove first char, usually division by 1
      f += 1 # increment start position
      t /= k # will be retested, could be replaced with bool

    elif t == T:
      # found match
      print ("FOUND match:", s[f:i+1])
      matches.append (s[f:i+1])

      if (i - f) < mn:
        mm = mi
        mn = i - f

      mi += 1

      t /= prms[ord(s[f]) - ord('a')] # remove first matching char

      # look for next match
      i += 1
      f += 1

    else:
      # no match yet, keep searching
      i += 1

  return (mm, matches)


print (find (charcount, str))

(note: this answer was originally posted to a duplicate question, the original answer is now deleted.)

score 0 · Answer 10 · answered Feb 18 '17 at 12:34

C# Implementation:

public static Tuple<int, int> FindMinSubstringWindow(string input, string pattern)
{
    Tuple<int, int> windowCoords = new Tuple<int, int>(0, input.Length - 1);
    int[] patternHist = new int[256];
    for (int i = 0; i < pattern.Length; i++)
    {
        patternHist[pattern[i]]++;
    }
    int[] inputHist = new int[256];
    int minWindowLength = int.MaxValue;
    int count = 0;
    for (int begin = 0, end = 0; end < input.Length; end++)
    {
        // Skip what's not in pattern.
        if (patternHist[input[end]] == 0)
        {
            continue;
        }
        inputHist[input[end]]++;
        // Count letters that are in pattern.
        if (inputHist[input[end]] <= patternHist[input[end]])
        {
            count++;
        }
        // Window found.
        if (count == pattern.Length)
        {
            // Remove extra instances of letters from pattern
            // or just letters that aren't part of the pattern
            // from the beginning.
            while (patternHist[input[begin]] == 0 ||
                   inputHist[input[begin]] > patternHist[input[begin]])
            {
                if (inputHist[input[begin]] > patternHist[input[begin]])
                {
                    inputHist[input[begin]]--;
                }
                begin++;
            }
            // Current window found.
            int windowLength = end - begin + 1;
            if (windowLength < minWindowLength)
            {
                windowCoords = new Tuple<int, int>(begin, end);
                minWindowLength = windowLength;
            }
        }
    }
    if (count == pattern.Length)
    {
        return windowCoords;
    }
    return null;
}

score 0 · Answer 11 · answered Oct 25 '18 at 09:33

I've implemented it using Python3 at O(N) efficiency:

def get(s, alphabet="abc"):
    seen = {}
    for c in alphabet:
        seen[c] = 0
    seen[s[0]] = 1
    start = 0
    end = 0
    shortest_s = 0
    shortest_e = 99999
    while end + 1 < len(s):
        while seen[s[start]] > 1:
            seen[s[start]] -= 1
            start += 1
        # Constant time check:
        if sum(seen.values()) == len(alphabet) and all(v == 1 for v in seen.values()) and \
                shortest_e - shortest_s > end - start:
            shortest_s = start
            shortest_e = end
        end += 1
        seen[s[end]] += 1
    return s[shortest_s: shortest_e + 1]


print(get("abbcac")) # Expected to return "bca"

score 0 · Answer 12 · answered Jul 30 '19 at 13:40

    String s = "xyyzyzyx";
    String s1 = "xyz";
    String finalString ="";
    Map<Character,Integer> hm = new HashMap<>();
    if(s1!=null && s!=null && s.length()>s1.length()){
        for(int i =0;i<s1.length();i++){
            if(hm.get(s1.charAt(i))!=null){
                int k = hm.get(s1.charAt(i))+1;
                hm.put(s1.charAt(i), k);
            }else
                hm.put(s1.charAt(i), 1);
        }
        Map<Character,Integer> t = new HashMap<>();
        int start =-1;
         for(int j=0;j<s.length();j++){
             if(hm.get(s.charAt(j))!=null){
                 if(t.get(s.charAt(j))!=null){
                     if(t.get(s.charAt(j))!=hm.get(s.charAt(j))){
                     int k = t.get(s.charAt(j))+1;
                        t.put(s.charAt(j), k);
                     }
                 }else{
                     t.put(s.charAt(j), 1);
                     if(start==-1){
                         if(j+s1.length()>s.length()){
                             break;
                         }
                         start = j;
                     }
                 }
                 if(hm.equals(t)){
                    t = new HashMap<>();
                    if(finalString.length()<s.substring(start,j+1).length());
                    {
                        finalString=s.substring(start,j+1);
                    }
                    j=start;
                    start=-1;                       
                 }
             }
         }

Could you please explain *why* and *how* your code snippet provides an answer to the question? Thank you. — deHaar, Jul 30 '19 at 14:02
I am using two HashMaps to store the number of character in each string and check if two maps are equal , if two maps are equal then we have got the substrings that are in given string. — Sai Chand, Aug 02 '19 at 12:21

score 0 · Answer 13 · answered Nov 09 '19 at 13:37

JavaScript solution in bruteforce way:

function shortestSubStringOfUniqueChars(s){
 var uniqueArr = [];
 for(let i=0; i<s.length; i++){
  if(uniqueArr.indexOf(s.charAt(i)) <0){
   uniqueArr.push(s.charAt(i));
  }
 }

 let windoww = uniqueArr.length;

 while(windoww < s.length){
  for(let i=0; i<s.length - windoww; i++){
   let match = true;
   let tempArr = [];
   for(let j=0; j<uniqueArr.length; j++){
    if(uniqueArr.indexOf(s.charAt(i+j))<0){
     match = false;
     break;
    }
   }
  let checkStr
  if(match){
   checkStr =  s.substr(i, windoww);
   for(let j=0; j<uniqueArr.length; j++){
    if(uniqueArr.indexOf(checkStr.charAt(j))<0){
     match = false;
     break;
    }
   }
  }
  if(match){
      return checkStr;
  }
   }
   windoww = windoww + 1;
 }
}

console.log(shortestSubStringOfUniqueChars("ABA"));

score 0 · Answer 14 · 2022-07-01T06:11:52.693

# Python implementation

s = input('Enter the string : ')
s1 = input('Enter the substring to search : ')
l = [] # List to record all the matching combinations

check = all([char in s for char in s1]) 
if check == True:
    for i in range(len(s1),len(s)+1) :
        for j in range(0,i+len(s1)+2):
            if (i+j) < len(s)+1:
                cnt = 0
                b = all([char in s[j:i+j] for char in s1]) 
            if (b == True) :
                l.append(s[j:i+j])
    print('The smallest substring containing',s1,'is',l[0])

else:
    print('Please enter a valid substring')

score -1 · Answer 15 · answered Nov 02 '14 at 04:02

Java code for the approach discussed above:

private static Map<Character, Integer> frequency;
private static Set<Character> charsCovered;
private static Map<Character, Integer> encountered;
/**
 * To set the first match index as an intial start point
 */
private static boolean hasStarted = false;
private static int currentStartIndex = 0;
private static int finalStartIndex = 0;
private static int finalEndIndex = 0;
private static int minLen = Integer.MAX_VALUE;
private static int currentLen = 0;
/**
 * Whether we have already found the match and now looking for other
 * alternatives.
 */
private static boolean isFound = false;
private static char currentChar;

public static String findSmallestSubStringWithAllChars(String big, String small) {

    if (null == big || null == small || big.isEmpty() || small.isEmpty()) {
        return null;
    }

    frequency = new HashMap<Character, Integer>();
    instantiateFrequencyMap(small);
    charsCovered = new HashSet<Character>();
    int charsToBeCovered = frequency.size();
    encountered = new HashMap<Character, Integer>();

    for (int i = 0; i < big.length(); i++) {
        currentChar = big.charAt(i);
        if (frequency.containsKey(currentChar) && !isFound) {
            if (!hasStarted && !isFound) {
                hasStarted = true;
                currentStartIndex = i;
            }
            updateEncounteredMapAndCharsCoveredSet(currentChar);
            if (charsCovered.size() == charsToBeCovered) {
                currentLen = i - currentStartIndex;
                isFound = true;
                updateMinLength(i);
            }
        } else if (frequency.containsKey(currentChar) && isFound) {
            updateEncounteredMapAndCharsCoveredSet(currentChar);
            if (currentChar == big.charAt(currentStartIndex)) {
                encountered.put(currentChar, encountered.get(currentChar) - 1);
                currentStartIndex++;
                while (currentStartIndex < i) {
                    if (encountered.containsKey(big.charAt(currentStartIndex))
                            && encountered.get(big.charAt(currentStartIndex)) > frequency.get(big
                                    .charAt(currentStartIndex))) {
                        encountered.put(big.charAt(currentStartIndex),
                                encountered.get(big.charAt(currentStartIndex)) - 1);
                    } else if (encountered.containsKey(big.charAt(currentStartIndex))) {
                        break;
                    }
                    currentStartIndex++;
                }
            }
            currentLen = i - currentStartIndex;
            updateMinLength(i);
        }
    }
    System.out.println("start: " + finalStartIndex + " finalEnd : " + finalEndIndex);
    return big.substring(finalStartIndex, finalEndIndex + 1);
}

private static void updateMinLength(int index) {
    if (minLen > currentLen) {
        minLen = currentLen;
        finalStartIndex = currentStartIndex;
        finalEndIndex = index;
    }

}

private static void updateEncounteredMapAndCharsCoveredSet(Character currentChar) {
    if (encountered.containsKey(currentChar)) {
        encountered.put(currentChar, encountered.get(currentChar) + 1);
    } else {
        encountered.put(currentChar, 1);
    }

    if (encountered.get(currentChar) >= frequency.get(currentChar)) {
        charsCovered.add(currentChar);
    }
}

private static void instantiateFrequencyMap(String str) {

    for (char c : str.toCharArray()) {
        if (frequency.containsKey(c)) {
            frequency.put(c, frequency.get(c) + 1);
        } else {
            frequency.put(c, 1);
        }
    }

}

public static void main(String[] args) {

    String big = "this is a test string";
    String small = "tist";
    System.out.println("len: " + big.length());
    System.out.println(findSmallestSubStringWithAllChars(big, small));
}

SidML · Answer 16 · 2016-09-25T23:09:09.180

def minimum_window(s, t, min_length = 100000):
    d = {}
    for x in t:
        if x in d:
            d[x]+= 1
        else:
            d[x] = 1

    tot = sum([y for x,y in d.iteritems()])
    l = []
    ind = 0 
    for i,x in enumerate(s):
        if ind == 1:
            l = l + [x]
        if x in d:
            tot-=1
            if not l:
                ind = 1
                l = [x]

        if tot == 0:
            if len(l)<min_length:
                min_length = len(l)
                min_length = minimum_window(s[i+1:], t, min_length)

return min_length

l_s = "ADOBECODEBANC"
t_s = "ABC"

min_length = minimum_window(l_s, t_s)

if min_length == 100000:
      print "Not found"
else:
      print min_length

How to find smallest substring which contains all characters from a given string?

16 Answers16

Linked

Related