8

I recently was asked to design an algorithm that checks if two strings are anagrams of one another. My goal was to minimize space and time complexity, so I came up with this algorithm:

  1. Create an array of 26 elements, each initialized to zero.
  2. Traverse the first string and for each character, increment the array element corresponding to that character.
  3. Traverse the second string and for each character, decrement the array element corresponding to that character.
  4. Scan over the array. If all elements are 0, the two strings are anagrams.

However, the time complexity of this algorithm is O(n) and I cannot come up with an algorithm with lower complexity. Does anybody know of one?

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
garima
  • 5,154
  • 11
  • 46
  • 77
  • 1
    I'm not an expert in this, but isn't O(n) already pretty efficient for something like this ? The only flaw I'm seeing is that you'll have a hard time handling "über" and "rübe" because you're limited to the latin characters (but if that is a precondition then that's OK). – DarkDust Mar 29 '11 at 09:01

5 Answers5

16

Your algorithm is asymptotically optimal. It's not possible to solve this problem in any better than Ω(n) time. To see this, suppose that an algorithm A exists that can solve the problem in o(n) time (note that this is little-o of n here). Then for any 1 > ε > 0, there is some n such that for any input of size at least n, the algorithm must terminate in at most εn steps. Set ε = 1/3 and consider any inputs to the algorithm that are of length at least n for the aforementioned n for this ε. Since the algorithm can look at most 1/3 of the characters in the two strings, then there must be two different inputs to the function, one that is a pair of anagrams and one that isn't, such that the algorithm looks at the same subset of the characters of each input. The function would then have to produce the same output in each case, and thus would be wrong on at least one of the inputs. We've reached a contradiction, so no such algorithm must exist.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
3

You could possibly improve average performance with early exits. While scanning the 2nd string, if count[char] is 0 before you decrement, you don't have an anagram and you can stop scanning.

Also, if the strings are shorter than 26 chars, then in the last step, check only the chars in the first string for zeroes.

This doesn't change the big O, but it can change your average runtime to something less than the 2N+26 o the proposed solution, depending on your data.

AShelly
  • 34,686
  • 15
  • 91
  • 152
2

Let's take a question: Given two strings s and t, write a function to determine if t is an anagram of s.

For example, s = "anagram", t = "nagaram", return true. s = "rat", t = "car", return false.

Method 1(Using HashMap ):

    public class Method1 {

    public static void main(String[] args) {
        String a = "protijayi";
        String b = "jayiproti";
        System.out.println(isAnagram(a, b ));// output => true

    }

    private static boolean isAnagram(String a, String b) {
        Map<Character ,Integer> map = new HashMap<>();
        for( char c : a.toCharArray()) {
            map.put(c,    map.getOrDefault(c, 0 ) + 1 );
        }
        for(char c : b.toCharArray()) {
            int count = map.getOrDefault(c, 0);
            if(count  == 0 ) {return false ; }
            else {map.put(c, count - 1 ) ; }
        }

        return true;
    }

}

Method 2 :

    public class Method2 {
public static void main(String[] args) {
    String a = "protijayi";
    String b = "jayiproti";


    System.out.println(isAnagram(a, b));// output=> true
}

private static boolean isAnagram(String a, String b) {


    int[] alphabet = new int[26];
    for(int i = 0 ; i < a.length() ;i++) {
         alphabet[a.charAt(i) - 'a']++ ;
    }
    for (int i = 0; i < b.length(); i++) {
         alphabet[b.charAt(i) - 'a']-- ;
    }

    for(  int w :  alphabet ) {
         if(w != 0 ) {return false;}
    }
    return true;

}
}

Method 3 :

    public class Method3 {
public static void main(String[] args) {
    String a = "protijayi";
    String b = "jayiproti";


    System.out.println(isAnagram(a, b ));// output => true
}

private static boolean isAnagram(String a, String b) {
    char[] ca = a.toCharArray() ;
    char[] cb = b.toCharArray();
    Arrays.sort(   ca     );

    Arrays.sort(   cb        );
    return Arrays.equals(ca , cb );
}
}

Method 4 :

    public class Method4 {
public static void main(String[] args) {
    String a = "protijayi";
    String b = "jayiproti";
    //String c = "gini";

    System.out.println(isAnagram(a, b ));// output => true
}

private static boolean isAnagram(String a, String b) {
    Map<Integer, Integer> map = new HashMap<>();
    a.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) + 1));
    b.codePoints().forEach(code -> map.put(code, map.getOrDefault(code, 0) - 1));
    //System.out.println(map.values());

    for(int count : map.values()) {
       if (count<0) return false;

    }

    return true;
}
}
Soudipta Dutta
  • 1,353
  • 1
  • 12
  • 7
1

To be sure the strings are anagrams you need to compare the whole strings - so how could that be faster than o(n)?

MacGucky
  • 2,494
  • 17
  • 17
  • No, using one array for both strings already seems to need the lowest space. – MacGucky Mar 29 '11 at 09:04
  • 1
    funny - I googled for that and found - [stackoverflow](http://stackoverflow.com/questions/4236906/finding-if-two-words-are-anagrams-of-each-other). The best found solution there is the same that you proposed. – MacGucky Mar 29 '11 at 09:06
  • @garima Your space complexity is O(1), and you're not going to do better than that in an asymptotic sense. From a standpoint of an absolute amount of space, you could reduce the space (at the cost of increasing asymptotic runtime complexity) by sorting each string with an O(n log n) sort and then comparing the results. (Note that your current approach is essentially doing a counting sort of the strings.) – jamesdlin Jul 05 '17 at 06:06
  • @jamesdlin: space complexity is actually **O(log n)** because the type for the counting array must be large enough to handle `n` repetitions. – chqrlie Mar 19 '18 at 21:50
-2
int anagram (char a[], char b[]) {

  char chars[26];
  int ana = 0;
  int i =0;

  for (i=0; i<26;i++)
        chars[i] = 0;


  if (strlen(a) != strlen(b))
        return -1;

  i = 0;
  while ((a[i] != '\0') || (b[i] != '\0')) {
        chars[a[i] - 'a']++;
        chars[b[i] - 'a']--;
        i++;
  }

  for (i=0; i<26;i++)
        ana += chars[i];

   return ana;

}


void main() {

  char *a = "chimmy\0";
  char *b = "yimmch\0";

  printf ("Anagram result is %d.\n", anagram(a,b));


}
asim kadav
  • 23
  • 2
  • Undefined behavior if any of the strings contain characters outside `a..z` or if the lowercase letters are not contiguous in the execution character set (OK for ASCII, but wrong for EBCDIC). – chqrlie Aug 08 '17 at 16:51
  • The test `while ((a[i] != '\0') || (b[i] != '\0'))` is both redundant and incorrect. You already checked that `a` and `b` have the same length, and it would be incorrect to index `chars[a[i] - 'a']` if `a[i]` is `'\0'`, even if `b[i]` is not. – chqrlie Aug 08 '17 at 16:53
  • The final loop does not verify that all letters have a zero count... in fact the sum of the array elements is always '0'. – chqrlie Aug 08 '17 at 16:54
  • `char` is too small a type for the `chars` array: a string with 256 `a` would seem to be an anagram of a string with 256 `b`. – chqrlie Aug 08 '17 at 16:56