Here is a solution with O(Np * log(Np)) time and O(Np) memory:
Initialize a dynamic DS container with {row,col} tuple as a key \
and a list of particles as a value
Iterate over each particle
Find {row, col} tuple for current particle
Find a value-list in container by {row, col} key
If there is no value in container for a value by this key
Then initialise a new particle list
Append current particle to a value-list
Container may be implemented as a balanced binary tree, which will give log(Np) multiplier to overall time complexity.
Another way to solve with O(Np + N) time and O(N) memory solution:
Initialize a simple lookup array byRow of size N, \
it will contain a list of particles in each cell
Iterate over each particle
Place the particle in corresponding cell of lookup array byRow by its ROW
Initialize another lookup array byCol of size N, \
it will contain a list of particles in each cell as well
Iterate over each cell of lookup list byRow
Iterate over each particle of the list in byRow[cellRow]
Place the particle in corresponding cell of byCol by its COL
Iterate over each particle of the list in byRow[cellRow]
\\ Now you have a list of other particles in the same NxN cell
\\ by looking at byCol[particleCol]
If byCol[particleCol] is not cleared
Print byCol[particleCol] list or put into other global storage and use later \
Clear byCol[particleCol] list
The idea is very simple. First you group particles by row storing them in lists of byRow
array. Then for particles of every list of byRow
array you make the same grouping by column. Each time you are reusing byCol
array. So overall memory complexity is O(N). Even we have two loops nested one in other we still have O(Np + N) time complexity because no inner step will be executed more than Np times.
Edit: Time complexity is O(Np + N) to be precise.