2

I am a Pandas user migrating to Xarray because I work with geospatial 3D data. Some stuff I only know how to do using Pandas and many times doesn't make any sense to convert to a Pandas DataFrame and then reconvert it to Xarray Dataset object.

What I am trying to do is to replace the current dimension of a Xarray object with two new ones, and those two new ones are currently data variables in the Xarray object.

We start from the point that the data is a Xarray object just like:

<xarray.Dataset>
Dimensions:  (index: 9)
Coordinates:
  * index    (index) int64 0 1 2 3 4 5 6 7 8
Data variables:
    Letter   (index) object 'A' 'A' 'A' 'B' 'B' 'B' 'C' 'C' 'C'
    Number   (index) int64 1 2 3 1 2 3 1 2 3
    Value1   (index) float64 0.5453 1.184 -1.177 0.8232 ... -1.253 0.3274 -1.583
    Value2   (index) float64 -0.4184 -0.3325 0.6826 ... -0.264 0.07381 0.4357

What I am trying to do is to reshape and reindexing the variables Value1 and Value2 to assign Letter and Number as its dimensions. The way I am used to doing is:

reindexed = data.to_dataframe().set_index(['Letter','Number']).to_xarray()

That returns:

<xarray.Dataset>
Dimensions:  (Letter: 3, Number: 3)
Coordinates:
  * Letter   (Letter) object 'A' 'B' 'C'
  * Number   (Number) int64 1 2 3
Data variables:
    Value1   (Letter, Number) float64 0.5453 1.184 -1.177 ... 0.3274 -1.583
    Value2   (Letter, Number) float64 -0.4184 -0.3325 0.6826 ... 0.07381 0.4357

This works very well if the data is not too big, but this seems stupid for me because it will load it into memory when I convert to DataFrame. I would like to find a way to do the same thing faster and lighter using Xarray only.

To help to reproduce the same problem, I made a code here below just to create a data similar to the one I have after reading the NetCDF file.

import numpy as np
import pandas as pd


df = pd.DataFrame()
df['Letter'] = 'A A A B B B C C C'.split()
df['Number'] = [1,2,3,1,2,3,1,2,3]
df['Value1'] = np.random.randn(9)
df['Value2'] = np.random.randn(9)
data = df.to_xarray()
iury simoes-sousa
  • 1,440
  • 3
  • 20
  • 37
  • personally I use geopandas when doing working with spatial data. The interchange is simple – Rob Raymond Jul 25 '20 at 03:08
  • Please consider revising your question, as the language used is confusing. If I understand you correctly what you want to do is replace the current dimension of an xarray object with two new ones, and those two new ones are currently data variables in the xarray object. Is that correct? – Robert Wilson Jul 25 '20 at 08:23
  • @RobertWilson, Yes, that's what I am trying to do. I edited the question. Is it better? – iury simoes-sousa Jul 25 '20 at 13:50
  • Does this answer your question? [How do I subdivide/refine a dimension in an xarray DataSet?](https://stackoverflow.com/questions/59504320/how-do-i-subdivide-refine-a-dimension-in-an-xarray-dataset) – OriolAbril Jul 26 '20 at 17:18

1 Answers1

1

You should be able to do this using the code below. You cannot remove dimensions in xarray, so you will have to replace the values of "index" with the values of Letter or Number first, and then rename the index dimension.

import numpy as np
import pandas as pd

df = pd.DataFrame()
df['Letter'] = 'A A A B B B C C C'.split()
df['Number'] = [1,2,3,1,2,3,1,2,3]
df['Value1'] = np.random.randn(9)
df['Value2'] = np.random.randn(9)
data = df.to_xarray()

(
data
 .assign_coords({"index": data.Letter.values})
 .assign_coords({"Number":data.Number.values})
 .drop("Letter")
 .rename_dims({"index":"Letter"})      
 .rename({"index":"Letter"})        
)
Robert Wilson
  • 3,192
  • 11
  • 19
  • Hi Robert, this piece of code only assign the variable Letter to the variables. `Dimensions without coordinates: Letter Value1 (Letter) Value2 (Letter)` What I am trying to do is to assign both Letter and Number. – iury simoes-sousa Jul 25 '20 at 14:30
  • I forgot to rename the coords. I am not sure where your dimensions with coordinates are coming from – Robert Wilson Jul 25 '20 at 14:34