3

I'm trying to check intersection between two strings using Python. I defined this function:

def check(s1,s2):
    word_array = set.intersection(set(s1.split(" ")), set(s2.split(" ")))
    n_of_words = len(word_array)
    return n_of_words

It works with some sample string, but in this specific case:

d_word = "BANGKOKThailand"
nlp_word = "Despite Concerns BANGKOK"

print(check(d_word,nlp_word))

I got 0. What am I missing?

JJack_
  • 859
  • 9
  • 30
  • you split on spaces, there are n ospaces in d_word, what do you expect? – lejlot May 18 '16 at 22:08
  • Ops, you're right. I think I won't be able to accomplish my task in this way, maybe I have to try with regex. What do you think? – JJack_ May 18 '16 at 22:11
  • regex, or some more advanced word separation methods from NLP – lejlot May 18 '16 at 22:17
  • If one of the strings will always be properly delimited (e.g. with spaces), you could use `sum(word in s1 for word in s2.split(" "))`, doing substring tests. That could perhaps lead to false positives if things like `the` match words like `these`, but that's probably impossible to avoid if you want your code to match the example strings you've given. – Blckknght May 18 '16 at 22:23

3 Answers3

2

I was looking for the maximum common part of 2 strings no matter where this part would be.

def get_intersection(s1, s2): 
    res = ''
    l_s1 = len(s1)
    for i in range(l_s1):
        for j in range(i + 1, l_s1):
            t = s1[i:j]
            if t in s2 and len(t) > len(res):
                res = t
    return res
#get_intersection(s1, s2)

Works for this example as well:

>>> s1 = "BANGKOKThailand"
>>> s2 = "Despite Concerns BANGKOK"
>>> get_intersection('aa' + s1 + 'bb', 'cc' + s2 + 'dd')
'BANGKOK'
kklepper
  • 763
  • 8
  • 13
  • Went with l_s1+1 in both ranges, works with `s1 = "a cde"` and `s2 = "b cde"` with `get_intersection(s1, s2)` – gseattle Apr 21 '23 at 12:20
0

Set one contains single string, set two 3 strings, and string "BANGKOKThailand" is not equal to the string "BANGKOK".

loshad vtapkah
  • 429
  • 4
  • 11
0

I can see two might-be mistakes:

n_of_words = len(array)

should be

n_of_words = len(word_array)

and

d_word = "BANGKOKThailand"

is missing a space in-between as

"BANGKOK Thailand"

Fixing those two changes gave me a result of 1.

Rafael
  • 1
  • 1
  • I fixed the first one, but unfortunately "BANGKOKThailand" has no space (I have to take it as it is, it's defined in a txt file I'm trying to analize) – JJack_ May 18 '16 at 22:13
  • I can see you fixed the word_array variable too, so happy to see it working now! – Rafael May 18 '16 at 22:15
  • Unfortunately it's not working, I cannot add the whitespace. This is an automatic algorithm for text processing and this is a particular case I should cover :( – JJack_ May 18 '16 at 22:16
  • Not sure about NLP, but if nlp_word is always separated by whitespaces, although d_word isn't; you could use KMP to loop over each word in nlp_word and search into d_word, keeping those parts that match in both sides and ignoring them for succesive tries. – Rafael May 18 '16 at 22:23