0

I have a string "abcdbca" and I'm instructed to slice two subarrays, say [0:3] and [4:7], I get strings "abc" and "bca". I've to find out if the two substrings are similar(same elements, max_allowed_mismatch_error = 1).

I tried count sort, but it's not that much of optimization. So, I though the next more optimized method could be hashing. But I can't figure out hash function to accurately solve the problem. I need to perform the operation several times.

ghost-sdk
  • 1
  • 1

1 Answers1

0

Hashing is no good.

There are two solutions, the simple one, which is to insist that the sub strings be of equal length and count equal characters, and the complex one, which is to use an alignment algorithm like Needleman-Wunch. That will give a much more robust idea of string similarity.

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
  • Thanks. I read about it but can't figure it out properly. Can you help by explaining a bit with respect to my above mentioned example? And why do you think hashing is no good? – ghost-sdk Jun 04 '17 at 16:24
  • Hashing is no good because two similar strings do not have similar hashes. An alignment algorithm will put similar portions of the string above each other and add gaps where necessary. so AAB and CAA with have the AA aligned. You then count same characters as with the naive method, – Malcolm McLean Jun 04 '17 at 18:21
  • But this doesn't answer the question properly. Suppose I have AAB, and ACA, then how will they be aligned? – ghost-sdk Jun 05 '17 at 11:46
  • Depends on your settings, but A-AB ACA- would be reasonable. The hyphen represents a gap. – Malcolm McLean Jun 05 '17 at 11:49
  • I think it won't solve my problem. But I'll try. Thanks for the help – ghost-sdk Jun 05 '17 at 12:09