Count distinct elements in every window of size k*k

Question

I found this on: http://www.geeksforgeeks.org/count-distinct-elements-in-every-window-of-size-k/

Given an array of size n and an integer k, return the of count of distinct numbers in all windows of size k.
(...)
An Efficient Solution is to use the count of previous window, while sliding the window. The idea is to create a hash map that stores elements of current widow. When we slide the window, we remove an element from hash and add an element. We also keep track of distinct elements.

But is there an efficent solution for 2-dimensional arrays and a window of size k*k?

Something similar to this maybe? https://stackoverflow.com/questions/46568388/searching-for-largest-rectangle-with-even-count-of-numbers-in-matrix/46573462#46573462 — m69's been on strike for years, Nov 07 '17 at 20:07
I edited the question mainly so that I'd be able to undo my downvote (at first I thought your question was identical to the one you linked to), but still, is there any reason to remove the link to the geeksforgeeks question and their solution for the 1D case? I thought it made the question much clearer. — m69's been on strike for years, Nov 08 '17 at 04:35
This question comes from the ongoing contest - Polish Olympiad in Informatics - [PL] http://oi.edu.pl/static/attachment/20171016/rozzad.pdf. Please wait with your answers until 14.11.2017 — Tacet, Nov 08 '17 at 07:45
@Tacet I was wondering why the asker edited the question to make it the one-liner it is now, and why he deleted all his comments. He was probably trying to stop others from finding it. — m69's been on strike for years, Nov 08 '17 at 10:44
About competition questions: https://meta.stackoverflow.com/questions/278771/how-to-deal-if-the-user-asks-for-code-in-online-programming-competition — m69's been on strike for years, Nov 08 '17 at 17:05
@m69 Yes, I know. Therefore I only comment about it. However, I prefer MSE policy https://math.meta.stackexchange.com/questions/16774/contest-problem-policy and I truly disagree with SO decisions. It really hurts true competitors. And I don't know how organizers may protect contest tasks, if they don't have support from SO moderators. — Tacet, Nov 08 '17 at 19:39

score 0 · Answer 1 · answered Nov 07 '17 at 19:47

0

The optimized solution depends a lot on your value repartition (if you have a lot of redundancies or a reduced number of possibilities). it can be interesting to store unique numbers you found previously in a existing_values[] table and check what already exists in it:

For each new value in your window:

Check if the value is in existing_values[] table

No: Add the value in the existing_values[] table and increment dist_count
Yes: Nothing

answered Nov 07 '17 at 19:47

hackela

353
1
9

As I mentioned it's interesting if you have a lot of redundancies. if you really want to reduce the k*k window you can apply this process on each row of the window and after that compare existing_values[] tables of each row. At least you will have only unique values to compare. – hackela Nov 07 '17 at 20:24

m69's been on strike for years · Answer 2 · 2017-11-07T21:13:31.787

A general way of checking a property of all k×k squares in a two-dimensional array would be to store the property of all rectangles from the top-left corner to each cell, and then compare the properties of the cells at the four corners of the rectangle you want to check.

Consider this example array with values from 0 to 9:

An array of hashes that store the number of occurances of each 0-9 value in the rectangles from top-left to each cell would be:

0100000000  0100000010  0100010010  0100011010  0100011020
0100000100  0110000110  0110020110  0110121110  0110122120
0101000100  1111000110  1121020110  1121131110  1121132130
0101000101  1111010111  1221030111  1221241111  1221243131

If you build this from top-left to bottom-right, each hash is the sum of the hash above it and the hash to the left of it, minus the hash above-left of it, with the value of the cell added; e.g. the hash for cell (1,1) is based on the hash for cells (1,0), (0,1), (0,0) and its own value 2:

0100000010 + 0100000100 - 0100000000 + 0010000000 = 0110000110

Once you have the array of hashes, you can check any rectangle using the hashes at its corners, e.g. to check this rectangle:

. . . . .  
. 2 5 4 .  
. 0 2 5 .  
. . . . .

you take the hashes at these positions:

A . . B .  
. . . . .  
C . . D .  
. . . . .

and the hash for the rectangle is D - B - C + A:

1121131110 - 0100011010 - 0101000100 + 0100000000 = 1020120000

which indicates that the rectangle has one 0, two 2's, one 4 and two 5's, so there are two distinct elements: 0 and 4.

Building the array of hashes means calculating m×n hashes (where m×n is the size of the array) each based on three other hashes, and checking every k×k square means calculating (m-k)×(n-k) hashes, each based on four hashes. Whether that means the time complexity is really O(m×n) probably depends on the range of values and the corresponding size of the hashes.

Count distinct elements in every window of size k*k

2 Answers2