-1

I need to slice the same element in 3D numpy array (actually masked array, but works the same). I usually do it with iterations - however current data is so huge and it needs repeating the process on thousands of datasets - it will take weeks (raw estimation). What is the quickest way to slice 3D array without looping through all 2D arrays?

In this simple example I need to slice [1, 0] element in each 2D array which is 3 in all 2D arrays and store them in result array.

NetCDF example (slicing element [500, 400])

import netCDF4

url = "http://eip.ceh.ac.uk/thredds/dodsC/public-chess/PET/aggregation/PETAggregation.ncml"
dataset = netCDF4.Dataset(url)

result = dataset.variables['pet'][:, 500, 400]

myarray SUPERSEDED NOW WITH ABOVE

myarray = np.array([
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
])

result = []
for i in myarray:
    result.append(i[1][0])

result [3, 3, 3, 3]

EDIT FirefoxMetzger suggested to slice it simply with result = myarray[:, 1, 0]. However, I'm getting the following error message with this:

RuntimeError: NetCDF: DAP server error

Curtis
  • 1,157
  • 4
  • 17
  • 30
  • 1
    `result = myarray[:, 1, 0]`? – FirefoxMetzger Aug 03 '20 at 18:27
  • Thanks @FirefoxMetzger! Your way is probably the quickest! But it's causing a server disconnection (not directly related to the question) - I will try to amend the question to reflect that – Curtis Aug 03 '20 at 18:37
  • I will need more context to interpret that. How did you load your data? – FirefoxMetzger Aug 03 '20 at 18:47
  • @FirefoxMetzger -I have updated my question - I will delete myarray simple example once there is a solution – Curtis Aug 03 '20 at 19:11
  • If you have access to the server, you could check the logs to see why it is crashing. Depending on the size of your dataset it could be too much for the server to return at once. You could try a smaller slice `dataset.variables['pet'][:10, 500, 400]` and see if that still leads to your error. If not, the solution is to get the values in chunks and aggregate them locally and then process as usual, or to simply get the database locally to not deal with DB access in the first place. – FirefoxMetzger Aug 03 '20 at 19:27
  • I already managed to slice it in smaller chunks - obviously your answer `[:, 1, 0]` was key (the original question)! Thanks a lot! – Curtis Aug 03 '20 at 19:37

1 Answers1

1

The minimal numpy example you provided can be efficiently sliced using standard slicing mechanisms:

myarray = np.array([
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
    [[1, 2], [3, 4], [5, 6]],
])

result = myarray[:, 1, 0]

The NetCFD seems to come from the resulting slice being too large to be returned from the server, causing a crash. As per your comment, the solution here is to query the server in chunks and aggregate the results locally.

FirefoxMetzger
  • 2,880
  • 1
  • 18
  • 32