2

I have a scene object, I would like to load all channels into a numpy array of shape (24,24,3). Where 3 is the number of channels.

scene_xybox = scn.crop(xy_bbox=box)

I have to select each channel:

channel= scene_xybox['VIS006'].values

repeat, and stack at the end. Is there a way to get the stacked numpy array with one line.

This takes 5 seconds for each box, I have many files and it will take a very very long time to do the same operation to multiple boxes in an image to multiple images.

Progman
  • 16,827
  • 6
  • 33
  • 48

1 Answers1

1

A perfect answer may require more information from you regarding what your end goal is, how many "boxes" you are cutting out, etc. But I'll see what I can clear up first. I assume you are not resampling the data with Scene.resample in your code at all.

Satpy uses dask so if possible it would be best to compute everything at once. Or at least limit how many times things are computed (.values computes the dask array). If you have a lot of boxes to cut out and your system has the available memory, you may want to calculate the slices yourself for all the xy bboxes (I think there are methods to help with this), load the entire image (see xr.concat below), and then use basic slicing techniques to get each of the box cutouts. This should save you from loading the data from disk each time you call .values, but also will really help with processing the other files you have since the slices should be the same across all times (except for special instrument cases).

You say you want the final shape to be (rows, cols, N). Is there a good reason you can't have (N, rows, cols)? The latter should be faster as the arrays are in their original contiguous form. If whatever processing you are doing after this could be done with dask at all this would "flow" really well with the tasks that would be made too.

You can use xr.concat, passing all the DataArrays at once and then call .values to get the full numpy array underneath. This should compute all the bands at the same time. Something like:

final_arr = xr.concat([scn['VIS006'], scn['band2'], scn['band3']], "bands").values
djhoese
  • 3,567
  • 1
  • 27
  • 45
  • Thanks, I have many seviri nat files(30k), and I need to extract around 50 20x20 areas from each file. I currently followed your advice and now I'm computing the whole array at the beginning and then slicing it as a numpy array to extract these areas. I'm currently doing it file by file and storing the results in a dictionary. Was wondering if there's a faster approach that may make use of stacked dask arrays or something of that sort. – Faisal Al Nasser Apr 06 '22 at 05:34
  • Are the keys of the dictionary the filenames/times or the bbox? You could make a 4D array of (bbox, bands, rows, columns) or even a 5D (file_number, bbox, bands, rows, columns). I'm not saying doing that would be faster, but it could be done with a lot of xr.concat calls and you wouldn't need to do `.values` until the end. Depending on what you're doing with these results, there are other options. – djhoese Apr 06 '22 at 16:22