2

I have approximately 75 2D raster maps (tifs) of elevation over the exact same area, each acquired at a different time. I would like to stack these using xarray. I can read in each raster (see below) but currently, there is no time coords as I need to extract the time from the title of each file (2017-02-15T06:13:38Z in file below).

da = xr.open_rasterio('tifs/DTSLOS_20170122_20190828_D79H_2017-02-15T06:13:38Z.tif')
da
<xarray.DataArray (y: 12284, x: 17633)>
[216603772 values with dtype=float64]
Coordinates:
    band     int64 1
  * y        (y) float64 59.62 59.62 59.62 59.62 59.62 ... 49.8 49.8 49.8 49.8
  * x        (x) float64 -12.17 -12.17 -12.17 -12.17 ... 1.931 1.932 1.932 1.933
Attributes:
    transform:   (0.0008, 0.0, -12.172852, 0.0, -0.0008, 59.623425)
    crs:         GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,2...
    res:         (0.0008, 0.0008)
    is_tiled:    1
    nodatavals:  (-9999.0,)

I'm assuming the way I should approach this is to add the time to each data array and then stack/concatenate them but I am new to xarray and am struggling to figure out how to do this.

2 Answers2

4

You would need to convert the respective string into a datetime timestring using datetime.strptime and set it as the dimension time along which you want to combine the datasets. You also need to expand this dimension, so when using xr.combine_by_coords you can combine the dataarrays along that dimension. One way to do this would be

import xarray as xr
from datetime import datetime
import pandas as pd

#collecting datasets when looping over your files
list_da = []

for path in ...:
    #path = "tifs/DTSLOS_20170122_20190828_D79H_2017-02-15T06:13:38Z.tif"
    da = xr.open_rasterio(path)

    time = path.split("_")[-1].split("Z")[0]
    dt = datetime.strptime(time,"%Y-%m-%dT%H:%M:%S")
    dt = pd.to_datetime(dt)

    da = da.assign_coords(time = dt)
    da = da.expand_dims(dim="time")

    list_da.append(da)

#stack dataarrays in list
ds = xr.combine_by_coords(list_da)

That's the way I approached this for my data. Not sure whether its the most elegant solution, but it worked for me

jstew
  • 53
  • 4
1

A more efficient way in terms of CPU and memory usage, based on @jstew code (thanks) and this answer, is:

def add_time_dim(xda):
    img_name = xda.encoding["source"]
    img_datetime = img_name.split("_")[-1].split("Z")[0]
    dt = datetime.strptime(img_datetime, "%Y-%m-%dT%H:%M:%S")
    dt = pd.to_datetime(dt)

    xda = xda.expand_dims(time = [dt])
    return xda

images = sorted(glob.glob("tifs/*.tif"))
datacube = xarray.open_mfdataset(images, preprocess=add_time_dim, parallel=True)

Hope that helps!