0

I have a larger 2d char grid of NxN (2 <= N <= 800). I am given a smaller 2d grid of KxK ( 2 <= K <= 100). For example, lets N = 3 and K = 2 and followings are the matrices,

Larger:

abc
abd
aaa

Smaller:

bd
aa

Problem 1: I have to return if the larger matrix contains the smaller matrix. For example above smaller matrix matched inside larger one.

Problem 2: I have to return the staring position mashing part on NxN if found. Above example return matched and position = (1, 1) #0 based

My assumption:

My assumption was to go with hashing. but still if there is any better idea to search efficiently. For example, I can make a hash function which will produce indices for all possible squares from NxN (2x2, 3x3, 4x4, ... , 100x100 as K can be upto 100) for all valid positions

(0,0), (0,1), ..., (0, N-K)
(1,0), (1,1), ..., (1, N-K)
. .
. .
(N-K,0), (N-K, 1) .... (N-K, N-K)

And then I can keep positions in the associated indices and when a input KxK comes I just run same hash function and see if returned index has a position or not.

Sazzad Hissain Khan
  • 37,929
  • 33
  • 189
  • 256
  • 1
    You can probably find something related if you search for image processing algorithms to find a sub-image within an image. – Bill the Lizard Mar 26 '18 at 12:32
  • Bill could plz share if u have any – Sazzad Hissain Khan Mar 26 '18 at 12:39
  • Maybe this issue can be solved by generalization [KMP](https://en.wikipedia.org/wiki/Knuth–Morris–Pratt_algorithm) algorithm to 2D (if it's possible). – freestyle Mar 26 '18 at 12:49
  • 1
    Please share details of your hashing approach otherwise we cannot know hoe efficient it already is. – MrSmith42 Mar 26 '18 at 13:03
  • @MrSmith42 Please see my update – Sazzad Hissain Khan Mar 26 '18 at 16:55
  • I don't know how does someone think the question is too broad! isn't it a specific question? why close vote? – Sazzad Hissain Khan Mar 26 '18 at 16:58
  • You could create a hash map with roughly 64 million entries max. Then queries can be answered in O(k^2). I think it could work, depends what your goal is. It is not clear if you get the large matrix once and answer several queries or if you get two new matrices each time. – maraca Mar 26 '18 at 17:55
  • Similar questions: https://stackoverflow.com/questions/10529278/fastest-way-to-find-a-m-x-n-submatrix-in-m-x-n-matrix and https://stackoverflow.com/questions/9885147/finding-sub-matrix-of-a-given-matrix – Eziz Durdyyev Mar 27 '18 at 01:24

1 Answers1

1

The trick for this problem is to use a a hash function that you can update in O(1) when you shift one position. This will lower the complexity to O(N^2).

An example of such hash function would be h = sum(x[i] * 2^i) % some_large_prime_number . Where x[i] is the ascii code of the i-th character. To update you would do

h_new = ((h_previous -
          (x[position_to_remove] * 2^k)) * 2 +
          (x[position_to_add] * 2^0
        ) % some_large_prime_number`

The hash function is not very strong so you will get some false positives. To boost the confidence in the match repeat the algorithm with a few different large prime numbers. It can still generate false positives, but they will be rare.

Note: Be careful to pick a prime number that doesn't overflow your integer types. You can apply the '%' operation on intermediate results as well to prevent overflows. Also '%' modulo operation will return a negative number in most languages for a negative input (you need to do the wrap around yourself when you do subtraction).

Sorin
  • 11,863
  • 22
  • 26