Detect that 2 string are same but in different order

Question

My goal is to detect that 2 string are same but in different order.

Example
"hello world my name is foobar" is the same as "my name is foobar world hello"

What i already tried is splitting both string into list and compare it within loop.

text = "hello world my name is foobar"
textSplit = text.split()

pattern = "foobar is my name world hello"
pattern = pattern.split()

count = 0
for substring in pattern:
    if substring in textSplit:
        count += 1

if (count == len(pattern)):
    print ("same string detected")

It return what i intended, but is this actually correct and efficient way? Maybe there is another approach. Any suggestion of journal about that topic would be really nice.

Edit 1: Duplicate words are important

text = "fish the fish the fish fish fish"
pattern = "the fish"

It must return false

What about where words are repeated? are "the fish" and "fish the fish the fish fish fish" the same? — Jon Clements, Oct 12 '17 at 07:35
`sorted(text) == sorted(pattern)` maybe? It is not very efficient but it is fairly easy to implement. — Ozgur Vatansever, Oct 12 '17 at 07:36
If dups are not important, `len(set(text).difference(pattern)) == 0` — Chen A., Oct 12 '17 at 07:37
@OzgurVatansever: What's not efficient about `sorted`? `O(n.log(n))` is almost always good enough, and close to `O(n)`. The problem with your suggestion is that `'abc'` and `'cba'` are considered equal. — Eric Duminil, Oct 12 '17 at 09:10
JonClements I missed that case. Thank you. Will update code and question soon. OzgurVatansever Thanks for suggestion Vinny Dups are important — nfl-x, Oct 12 '17 at 09:12
@EricDuminil adding the `==` check which is `0(n)`, entire complexity would be `0(n^2.log(n))` — Ozgur Vatansever, Oct 12 '17 at 09:20
@OzgurVatansever: `Adding`, not `Multiplicating`. `O(n + n.log(n))` is still `O(n.log(n))` — Eric Duminil, Oct 12 '17 at 09:21
@EricDuminil you are right. I think sorting the splitted sentence would fix the other problem as well. — Ozgur Vatansever, Oct 12 '17 at 15:35

Eric Duminil · Accepted Answer · 2017-10-12T09:23:20.977

If you want to check that 2 sentences have the same words (with the same number of occurences), you could split the sentences in words and sort them:

>>> sorted("hello world my name is foobar".split())
['foobar', 'hello', 'is', 'my', 'name', 'world']
>>> sorted("my name is foobar world hello".split())
['foobar', 'hello', 'is', 'my', 'name', 'world']

You could define the check in a function:

def have_same_words(sentence1, sentence2):
    return sorted(sentence1.split()) == sorted(sentence2.split())

print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True

print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True

print(have_same_words("hello", "hello hello"))
# False

print(have_same_words("hello", "holle"))
# False

If case isn't important, you could compare lowercase sentences:

def have_same_words(sentence1, sentence2):
    return sorted(sentence1.lower().split()) == sorted(sentence2.lower().split())

print(have_same_words("Hello world", "World hello"))
# True

Note: you could also use collections.Counter instead of sorted. The complexity would be O(n) instead of O(n.log(n)), which isn't a big difference anyway. import collections might take a longer time than sorting the strings:

from collections import Counter

def have_same_words(sentence1, sentence2):
    return Counter(sentence1.lower().split()) == Counter(sentence2.lower().split())

print(have_same_words("Hello world", "World hello"))
# True

print(have_same_words("hello world my name is foobar", "my name is foobar world hello"))
# True

print(have_same_words("hello", "hello hello"))
# False

print(have_same_words("hello", "holle"))
# False

Thank you. It works as intended. Can you please summarize or link me what complexity, worst case, and why/how that complexity is defined as O(n). It would be really helpful. — nfl-x, Oct 12 '17 at 09:39
Sorting is `O(n.log(n)`, counting is `O(n)`. There's not much more to say. Except: Given the size of the sentences, we shouldn't care about complexity. — Eric Duminil, Oct 12 '17 at 09:41
Looks like i need to start figure out what are those symbol haha. — nfl-x, Oct 12 '17 at 09:51

Chris Charles · Answer 2 · 2017-10-12T11:18:38.670

3

I think with your implementation then extra words in the text get ignored (maybe this was intended?).

Ie if text = "a b" and pattern = "a" then yours prints "same string detected"

The way I'd do it: Comparison where order doesn't matter makes me think of sets. So a solution with sets would be:

same = set(text.split()) == set(pattern.split())

Edit: Taking into account the repeated words edit to the question:

from collections import Counter
split_text = text.split()
split_pattern = pattern.split()
same = (Counter(split_text) == Counter(split_pattern))

edited Oct 12 '17 at 11:18

answered Oct 12 '17 at 07:38

Chris Charles

4,406
17
31

Your solution considers that `"hello"` and `"hello hello"` are equal. It's not clear if it's the desired behaviour. – Eric Duminil Oct 12 '17 at 09:08

score 0 · Answer 3 · answered Oct 12 '17 at 07:38

0

you can make a list from each string and calculate a a string intersection between them; if it is as same length as the first on so it's ok they are the same.

text = "hello world my name is foobar"
pattern = "foobar is my name world hello"
text = text.split(" ")
pattern = pattern.split(" ")
result = True
if len(text) != len(pattern):
    result = false
else:
    l = list(set(text) & set(pattern))
    if len(l)!=len(text):
        result = False
if result == True:
    print ("same string detected")
else:
    print ("Not the same string")

answered Oct 12 '17 at 07:38

Mehdi Ben Hamida

893
4
16
38

You need to be wary of your length check there... `if len(l) != len(text)` - since `l` has duplicates removed then where `text` has duplicated words - this check isn't going to be reliable... – Jon Clements Oct 12 '17 at 07:44
`set(text)` and `set(pattern)` removes duplicates – Mehdi Ben Hamida Oct 12 '17 at 07:50

score 0 · Answer 4 · answered Oct 12 '17 at 08:18

you can also make a new string str12 from the strings you want to compare. Then compare the lenght of str12 with 2 * (str12 without duplicate)

str1 = "hello world my name is foobar"
str2 = "my name is foobar world hello"


str12 = (str1 + " " +str2).split(" ")

str12_remove_duplicate = list(set(str12))

if len(str12) == 2 * len(str12_remove_duplicate):
    print("String '%s' and '%s' are SAME but different order" % (str1, str2))
else: 
    print("String '%s' and '%s' are NOT SAME" % (str1, str2))

Detect that 2 string are same but in different order

4 Answers4