0

I have two geo-referenced matrices (gridded climate datasets) with different spatial extent and resolution (i.e., spatial coverage of one pixel on the surface of the Earth) that I would like to join. Let's call them referenceMatrix and targetMatrix. The matrices can be loaded into MATLAB as geotiffs or just matrices with corresponding grids of latitudes/longitudes per pixel.

What I want is to fill NaN-pixels in the targetMatrix with the corresponding value from the referenceMatrix. I can do it using a for-loop that looks through the pixels one by one, and fills the NaNs in the targetMatrix with data from the referenceMatrix based on the nearest pixel. The method I am using to localize the nearest pixel in space is described here:
How to find the nearest points to given coordinates with MATLAB?

However I need to do this with thousands of matrices, and the for-loop is too slow. Therefore I would like to use logical indexing, e.g.

targetMatrix(isnan(targetMatrix)) = referenceMatrix(isnan(targetMatrix))

with the added ability to match pixels in the matrices based on their latitude and longitude. Can any of you point me in a direction with an example of comparing matrices with different extent based on their georeference?

An example of input data and the wanted output below

targetMatrix = [1,  NaN, 3; 
                NaN, 5,  6]; 
referenceMatrix = [10, 20, 30, 40; 
                   50, 60, 70, 80]; 
referenceLatitude = [13.3, 13.3, 13.3, 13.3; 
                     14.1, 14.1, 14.1, 14.1]; 
referenceLongitude = [3.2, 4.2, 5.2, 6.2; 
                      3.2, 4.2, 5.2, 6.2];  
targetLatitude = [13.4, 13.4, 13.4; 
                  13.9, 13.9, 13.9]; 
targetLongitude = [3.1, 3.6, 4.1; 
                   3.1, 3.6, 4.1]; 

wantedOutput = [ 1, 10, 3; 
                50,  5, 6];

The wanted output consists of the original values from the targetMatrix where the NaNs are filled with the nearest (in space) value from the referenceMatrix, i.e., 10 and 50.

Sardar Usama
  • 19,536
  • 9
  • 36
  • 58
  • Thanks, I have added examples to the original post – user3275421 Jun 12 '18 at 07:16
  • All matrices are inputs except for the "wantedOutput" – user3275421 Jun 12 '18 at 07:29
  • The NaN in the `targetMatrix(1,2)` has the lat/lon coordinates (13.4, 3.6). The value 10 in the `referenceMatrix(1,1)` has the lat/lon coordinates (13.3, 3.2) and is thus the point located closest in space to `targetMatrix(1,2)`. The value 20 in the `referenceMatrix` has the lat/lon coordinates (13.3, 4.2) and the point is therefore further away in space from the `targetMatrix(1,2)`. – user3275421 Jun 12 '18 at 09:40

1 Answers1

1

Find which entries of targetMatrix are to be replaced using isnan. Convert the lat/lon from degrees to radians and then to cartesian coordinates using sph2cart to get actual geodesic distances. Then use knnsearch to find the indices of the points from the reference coordinates which are nearest to relevant targeted coordinates. Use these indices to extract the relevant entries from referenceMatrix and replace NaNs with them.

nanent = isnan(targetMatrix);
[tarX, tarY] = sph2cart(targetLongitude*pi/180, targetLatitude*pi/180, 1);
[refX, refY] = sph2cart(referenceLongitude*pi/180, referenceLatitude*pi/180, 1);
tmptar = [tarY(nanent) tarX(nanent)];
tmpref = [refY(:) refX(:)];
ind = knnsearch(tmpref, tmptar);
wantedOutput = targetMatrix;
wantedOutput(nanent) = referenceMatrix(ind);

In this case, converting lat/lon to cartesian coordinates prior to the use of knnsearch is tested to sped up knnsearch than otherwise.

knnsearch finds euclidean distance by default. You can also do this using a combination of pdist2 and min instead of knnsearch (in line 6). i.e.

[~, ind] = min(pdist2(tmptar, tmpref), [], 2);

or you can use desearchn in line 6. i.e.

ind = dsearchn(tmpref, tmptar);

But knnsearch is tested to be faster than dsearchn in this case.


knnsearch and pdist2 require ≥ R2010a with Stats and ML Toolbox. Use dsearchn if you do not have that.
All tests are done by the OP.

Sardar Usama
  • 19,536
  • 9
  • 36
  • 58
  • Thank you very much for the input. Some of the data is close to the poles and the 180 degree meridian, so I think using `distance()` may be a better option than euclidian distance, since it takes the spherical shape into account. Another thing is that the size of the matrices I have is 2000 x 2000. Therefore using the `pdist2` is not an option since it creates an array of 2000^4. `dsearchn` is a neat function, thank you introducing it, however it takes equally long time to index the combinations for one set of matrices as it does using a for-loop. – user3275421 Jun 12 '18 at 12:14
  • @user3275421 try with `knnsearch` as suggested above – Sardar Usama Jun 12 '18 at 12:40
  • Thanks again for your input. The good news is that using the knnsearch works fast. The bad news is that it still computes distance on a flat surface (default is euclidian). This distance measure is different from the geodesic distance (i.e., the distance when travelling on the surface of a ball or ellipsoid). The `distance` function can do this but again I struggle to implement this function in a logical indexing scheme. – user3275421 Jun 28 '18 at 12:05
  • In fact, what could be done would be to use `distance` as a custom distance function in `knnsearch` according to the Matlab documentation. I am trying to figure the syntax out, but input on this would be highly appreciated. – user3275421 Jun 28 '18 at 13:10
  • @user3275421 In that case, this may be as simple as `ind = knnsearch(tmpref, tmptar,'Distance',@distance);` (I didn't test it) – Sardar Usama Jun 28 '18 at 13:49
  • Yes, thanks, I tried it out and it works, however the processing time is >60 minutes per matrix, and I have 47000 so it won't work. I will rethink the workflow, and probably try to regrid the matrices so they match geographically and in terms of spatial resolution – user3275421 Jul 02 '18 at 11:05
  • @user3275421 What is the size of your matrices? How much available memory do you have? It is quite possible that you're running out of memory – Sardar Usama Jul 02 '18 at 11:07
  • Thanks, memory is not the issue (128 gb ram, only using approx 8 gb), and 28-core Xeon workstation. The matrices are 1200 x 1200 (double). It seems the integration of `distance` in `knnsearch` is slow or that it should be integrated in another way. – user3275421 Jul 02 '18 at 11:34
  • @user3275421 I don't think there is any issue with the integration – Sardar Usama Jul 02 '18 at 11:40
  • I have solved the problem by converting the Lat/Lon to cartesian coordinates with `sph2cart` (convert degrees to radians prior to the cartesian conversion) and then use your idea with the `knnsearch`. This is a very fast solution in terms of processing speed, and then I get the actual geodesic distance. If you add the step with cartesian coordinates, I can acknowledge your answer. Thanks – user3275421 Jul 03 '18 at 12:11
  • @user3275421 You're welcome. You can edit my answer to include your insight by clicking edit button under my answer, I'll approve your edit. – Sardar Usama Jul 03 '18 at 12:26