I get a time exceeded for grouping words in anagrams , is there any other way

Question

class Solution:
# @param {string[]} strs
# @return {string[][]}
def isAnagram(self, s, t):
    s1 = [];s2 = []
    s1[:0] = s
    s2[:0] = t
    s1.sort();s2.sort()
    if s1 == s2:
        return True
    else:
        return False
def groupAnagrams(self, strs):
    l=[]
    for i in strs:
        m=[]
        if m not in m:
            m.append(i)
            for j in strs:
                if self.isAnagram(i,j)==True:
                    m.append(j)
                    strs.remove(j)
        l.append(m)
    return l

the above code when fed with :

["shuffled","lacquered","efficacious","michigander","corruptness","internals","converter","speeds","rebellion","transceivers","electroencephalogram","crematories","bespoken","complainant","flotations","nev","blindfolding","corresponds","optionally","aggravating","gratifying","healthfulness","characterizing","dole","fantasies","bulks","responsibly","exploiting","confluences","header","dunno","saddam","adulate","spoken","bargained","funiculars","enlargements","mastered","expended","zambians","muggiest","riveted","junketing","shrewish","issachar","wallpapered","bridges","efficacious","cogitation","parabola","inheres","song","chock","surfing","windy","richer","shields","rehash","autobiographical","idiotic","discipline","keyword","proliferation","hollower","exposing","britain","fred","salarying","misplaying","gallbladder","czechoslovakia","burying","deprivation","lubricated","androids","hurtle","kitty","attach","subsidies","tumbled","unseemliest","impelling","surmise","blundered","etching","stuccoes","windiest","monorail","raided","comedians","theodora","muhammadans","sillies","unlocking","lubricating","desperados","vine","purposeless","calmest","loopy","confluences","clings","today","mountaineer","son","axiomatic","thur","ideograph","document","rudolf","joviality","crystals","moodiest","footprints","net","taney","crane","psycho","quantified","aisle","aimee","vegetarianism","canes","twining","butler","transporters","cohere","wilts","outlines","imbecile","passages","godunov","sunken","maneuvers","papyruses","slowed","residuals","tarpaulins","devour","callus","aldebaran","wraiths","outplay","psychoanalyst","flicking","congealing","unsteadier","smoother","bavarian","savvy","wino","tortola","stiflings","deprecation","iguassu","surnames","chit","fraud","strong","camel","undulate","jiggling","lars","singsonging","canny","someway","overtaken","sonja","rapacity","scotch","discus","spill","boated","americanized","phoneyed","nonprofessional","excessive","nuisance","haddock","fared","jibes","lintels","nurturing","falls","testimonial","pluralism","cookeries","cocksure","cassock","appraiser","contingent","barbarous","shoo","groundings","tulsa","hughes","fiver","taces","compatriot","cockpit","sepoy","naughties","topeka","decadents","rangers","topaz","kr","accoladed","palmed","jackknifes","overbore","blintze","shari","corroborations","mortgagees","tylenol","rockies","caesar","estimations","disconnects","coordinating","satinwood","octopus","smithsonian","dustiness","subscript","compacting","sanctuary","restarting","palmist","johnie","winos","conurbations","contrived","crumby","demavend","blooding","electrodes","composed","wheres","clements","ululate","basketball","cattlemen","callus","toolboxes","harelips","garaged","fuller","stubborn","scald","devotion","revolvers","kernels","lean","adversaries","floe","uninvited","umiaks","crackup","molested","santiago","contraltos","bethany","exhortations","preferential","gina","processor","beleaguering","fountainhead","politicking","denounces","eats","zodiacs","lubricated","prisoning","chautauqua","apparently","apiaries","lawrence","ellis","vampired","falsifiable","shaker","impecuniousness","maurice","vaginas","fran","cobain","angkor","discernment","numbs","bridges","novelette","renumbering","multiplicand","gluey","tots","garment","outran","disrespects","chino","pennsylvania","puff","chilly","roosted","fuses","concede","unimplemented","misogynist","disheveling","wiggler","penciling","storage","thoroughbreds","copiously","unidentifiable","warpaths","detriments","wantoning","welling","philosophizes","proprietorship","crumbliest","forgather","hemlocks","evangeline","abelson","extant","hijacking","repelling","stockholder","rebuking","stagnates","mechanization","shenyang","obeisance","english","erythrocyte","marring","regenerated","spinster","pest","forgathered","projectionist","match","smolder","rhinos","libretti","astutely","recuperates","outsources","vole","maestros","viewers","imprecision","astrophysicist","aristotelian","impressing","picnicked","minimalism","commas","ladled","gobbles","aborts","ahem","lira","surreptitious","corpses","london","hallucination","hendricks","traumata","anchovy","medication","reexamine","stabilization","jackboot","insular","floated","silkier","entertains","barren","savvier","volatile","amethysts","feuds","cheddar","cogs","trinities","underpasses","whoopee","cult","housing","fussbudgets","laminated","regress","boeotian","fugitive","anthers","nebraska","torch","declassify","tijuana","badges","cohan","stylish","formosan","lifestyles","impresario","love","errata","teletypewriters","resembled","cork","weaver","darlene","preoccupied","cage","faun","reclassifies","confinements","evolution","jayne","syndicate","soaping","provincials","regional","squabble","apricot","totes","herbart","beards","carpetbagged","assignable","henpecks","coating","amplified","insulation","smooths","parliament","sahara","bursitis","lingos","wherewithal","inoffensively","overcrowds","bhutan","disarrange","zippy","flosses","parnell","erratas","sidings","clapboards","confederated","palliative","wirelesses","etruscans","neonates","clayey","vaccinating","peskiest","liable","bibliographical","squidded","hausdorff","lumberyard","blythe","pillions","fiddlesticks","sarong","scarfed","reformer","gunrunning","sweaters","entreats","wicca","tennis","quilt","canisters","frankincense","unbar","neighed","cicadas","bighorns","tittles","dimaggio","costuming","judas","paints","pastorals","carib","glamored","cantering","demotes","currying","excommunicating","thwarting","freebase","niagara","fortification","buttercups","survey","barracudas"]

shows that error. I'm new to python, so m not able to cross this. Thanks :)

I assume you're running your code in some sort of hosted/educational environment? Can you give us a bit more background on exactly what you're trying to do (a could of sentences?) and also format your code properly (class indentation looks off)? — Tom Dalton, Aug 17 '15 at 09:09
For example, given: ["eat", "tea", "tan", "ate", "nat", "bat"] We need a output : [ ["ate", "eat","tea"], ["nat","tan"], ["bat"] ] Its on leetcode — Shashank Shekhar, Aug 17 '15 at 09:14
So the problem with your code (in terms of taking a long time) is that compares every string with every other string. So if you have 1000 strings, it will do 1000 * 1000 (= 1000000) comparisons. This is known as "Order N Squared" or "O(n^2)", which means it will get quite slow very quickly. You'll need to change your algorithm so that you don't need all those comparisons. Hint: Maybe think about pre-processing each string into a https://docs.python.org/2/library/collections.html#collections.Counter — Tom Dalton, Aug 17 '15 at 09:17
No it won't do so , as soon as it find its anagram, i remove it from the original list, — Shashank Shekhar, Aug 17 '15 at 09:30

score 0 · Answer 1 · answered Aug 17 '15 at 09:41

There seem to be quite some problems with your code.

if m not in m: this line does not make much sense...
strs.remove(j) never ever remove from a list while iterating that list, or bad things happen
you are comparing each string with each other string, including itself

Thus, for your ["eat", "tea", "tan", "ate", "nat", "bat"] example, your code returns [['eat', 'eat', 'ate'], ['tan', 'tan'], ['bat', 'bat']]

Concerning performance, the biggest problem seems to be that you are sorting the strings each time you are comparing them to any other string, i.e. in total, you are sorting n² times! Instead, I suggest you use a dictionary to map the strings to their sorted versions.

def groupAnagrams(strs):
    res = {}
    for s in strs:
        res.setdefault(''.join(sorted(s)), []).append(s)
    return res.values()

Output for the example is [['tan', 'nat'], ['bat'], ['eat', 'tea', 'ate']]

score 0 · Accepted Answer · answered Aug 17 '15 at 09:52

You can do something like this. Here I hash each word and make a map of the words. Words with same hash are anagrams.

import collections


def sort_prehash(word):
    return ''.join(sorted(word))  

def group_anagrams(words, hash_function):
    result = {}
    for w in words:
        s = hash_function(w.lower())
        if s in result:
            result[s] |= {w}
        else:
            result[s] = {w}
    return result.values()

orig = ["eat", "tea", "tan", "ate", "nat", "bat"]

print group_anagrams(orig, sort_prehash)

I get a time exceeded for grouping words in anagrams , is there any other way

2 Answers2