4

I'm using xarray to read single point data from an openDAP server, then I convert the xarray object to dataframe. This works fine. I would like to read multiple points in a single call, but I don't which is the best approach to do so.

This is the code I'm using for a single point:

import pandas as pd
import xarray as xr

url = 'http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20161111/gfs_0p25_00z'
lats =  [40.1,40.5,42.3]
lons =  [1.02,1.24,1.84]
vars = ['dswrfsfc', 'tmp2m', 'pressfc']

ds = xr.open_dataset(url)

data_single  = ds.sel(lon=lons[0], lat=lats[0], method='nearest')    
ts_dataframe_single = data_single[vars].to_dataframe()

For reading multiple points I do:

data  = ds.sel(lon=lons, lat=lats, method='nearest')
ts_dataframe = data[vars].to_dataframe()

And this is the output of data.coords:

data.coords
Out[10]: 
Coordinates:
  * time     (time) datetime64[ns] 2016-11-11 2016-11-11T03:00:00 ...
  * lev      (lev) float64 1e+03 975.0 950.0 925.0 900.0 850.0 800.0 750.0 ...
  * lat      (lat) float64 40.0 40.5 42.25
  * lon      (lon) float64 1.0 1.25 1.75

When I convert to a dataframe the resulting object contains a mix of time and coordinates in the timestamp. This is how it looks:

dataframe containg multiple points

My questions are:

  • Is this the best way to retrieve multiple point with xarray?
  • How do I extract the data from a single point of the resulting dataframe?

Thanks in advance!

3 Answers3

5

I think you want sel_points instead of sel. So, something like this:

data  = ds.sel_points(lon=lons, lat=lats, method='nearest')
ts_dataframe = data[vars].to_dataframe()
jhamman
  • 5,867
  • 19
  • 39
  • Thank @jhamman, this method should do the job. The thing is that when I run the command `data = ds.sel_points(lon=lons, lat=lats, method='nearest')` computer starts thinking and RAM memory usage s rises until I loose control of the computer. Is that normal? Am I doing something wrong? – Jordi Vidal de LLobatera Nov 14 '16 at 11:35
  • sel_points is deprecated. Memory issues have to do with the large (npoints^2) grid of points it is searching across. – Justina Pinch Jul 07 '20 at 21:39
4

The memory issue you are seeing is that the output is a lat x lon grid of pixel values, when you are really only interested in the grid's diagonal entries, corresponding to each lat, lon coordinate pair. This is very computationally expensive because it is searching for pixel values of n^2 points instead of n points.

'sel_points()' is deprecated because it can be accomplished using sel/isel (see below).

Instead, you can to set lat and lon as their own DataArrays:

lats = xr.DataArray(lats, dims='z') #'z' is an arbitrary name placeholder
lons = xr.DataArray(lons, dims='z')
data = ds.sel(lat = lats, lon = lons, method = 'nearest')
#VariableName = 'lev', so you can also do: 
data = ds.lev.sel(lat = lats, lon = lons, method = 'nearest')

See this related post for more.

Justina Pinch
  • 367
  • 1
  • 7
0

Another method would be slicing

data = ds.sel(lat=slice(40.1,42.3), lon=slice(1.02,1.84))

However, you get more points than you asked for. It's fast, though.

TomTom101
  • 6,581
  • 2
  • 20
  • 31