I am a Pandas user migrating to Xarray because I work with geospatial 3D data. Some stuff I only know how to do using Pandas and many times doesn't make any sense to convert to a Pandas DataFrame and then reconvert it to Xarray Dataset object.
What I am trying to do is to replace the current dimension of a Xarray object
with two new ones, and those two new ones are currently data variables in the Xarray object
.
We start from the point that the data
is a Xarray object
just like:
<xarray.Dataset>
Dimensions: (index: 9)
Coordinates:
* index (index) int64 0 1 2 3 4 5 6 7 8
Data variables:
Letter (index) object 'A' 'A' 'A' 'B' 'B' 'B' 'C' 'C' 'C'
Number (index) int64 1 2 3 1 2 3 1 2 3
Value1 (index) float64 0.5453 1.184 -1.177 0.8232 ... -1.253 0.3274 -1.583
Value2 (index) float64 -0.4184 -0.3325 0.6826 ... -0.264 0.07381 0.4357
What I am trying to do is to reshape and reindexing the variables Value1
and Value2
to assign Letter
and Number
as its dimensions.
The way I am used to doing is:
reindexed = data.to_dataframe().set_index(['Letter','Number']).to_xarray()
That returns:
<xarray.Dataset>
Dimensions: (Letter: 3, Number: 3)
Coordinates:
* Letter (Letter) object 'A' 'B' 'C'
* Number (Number) int64 1 2 3
Data variables:
Value1 (Letter, Number) float64 0.5453 1.184 -1.177 ... 0.3274 -1.583
Value2 (Letter, Number) float64 -0.4184 -0.3325 0.6826 ... 0.07381 0.4357
This works very well if the data is not too big, but this seems stupid for me because it will load it into memory when I convert to DataFrame. I would like to find a way to do the same thing faster and lighter using Xarray only.
To help to reproduce the same problem, I made a code here below just to create a data similar to the one I have after reading the NetCDF file.
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['Letter'] = 'A A A B B B C C C'.split()
df['Number'] = [1,2,3,1,2,3,1,2,3]
df['Value1'] = np.random.randn(9)
df['Value2'] = np.random.randn(9)
data = df.to_xarray()