0

I am working on a wind analysis for a new development. I would be able to predict the air flow pattern in the development for each hour of the year as a function of the wind speed and direction for that particular hour. Of course it would take too much time to run 8760 wind CFD simulations. My approach is to run only 16 simulations (8 wind directions and 2 wind speed) and interpolate the flow distribution from these results.

To give you an idea on how the data look like I have created a simplified case.

X = pd.Series([1,2,3,4,5])
Y = pd.Series([1,2,3,4,5])
Z = pd.Series([1,2,3,4,5])
v1 = pd.Series([2,6,1,7,8])
df1 = pd.DataFrame({'X':X,'Y':Y,'Z':Z,'v':v1})
df1['ws']=3
df1['wd']=180
v2 = pd.Series([3,1,4,2,2])
df2 = pd.DataFrame({'X':X,'Y':Y,'Z':Z,'v':v2})
df2['ws']=3
df2['wd']=0
v3 = pd.Series([2.5,2.3,1.3,7.2,1.4])
df3 = pd.DataFrame({'X':X,'Y':Y,'Z':Z,'v':v3})
df3['ws']=6
df3['wd']=180
v4 = pd.Series([2.4,5.6,6.1,2.3])
df4 = pd.DataFrame({'X':X,'Y':Y,'Z':Z,'v':v4})
df4['ws']=6
df4['wd']=0
df=pd.concat([df1,df2,df3,df4])

Please notice the last two columns contain the meteorological wind speed and direction for that particular simulation. The points (X,Y,Z) can be in the order of 100,000.

Now suppse that I want the flow distribution (X,Y,Z,v) for an intermediate value of wind speed (ws) and wind direction (wd). I would like to be able to aggregate the data and obtain a linear interpolation of the velocity field (v) at each point (X,Y,Z) To put it in a formula: (X,Y,Z)=f(data,ws,wd)

I guess I need to use the groupby function, but couldn't figure out a way to do it with two variables.

Also, do you think a data panel would be more adequate data structure for this kind of data?

sawa
  • 165,429
  • 45
  • 277
  • 381
Rojj
  • 1,170
  • 1
  • 12
  • 32

1 Answers1

0

If you want to look at distributional features conditional on two variables, you can proceed as

In[10]: df.groupby(['ws', 'wd']).apply(lambda x: x.mean())
Out[10]: 
        X  Y  Z     v  ws   wd
ws wd                         
3  0    3  3  3  2.40   3    0
   180  3  3  3  4.80   3  180
6  0    3  3  3  4.10   6    0
   180  3  3  3  2.94   6  180

With regards to panel data, it generally is rather a question of taste, right? Do you consider X,Y,Z dimensions that you want to generalize over, or not. I typically wouldn't do that, so then you're only left with time, and that gives you a time series rather than a panel.

Moreover, pandas' panel package used to lack many features that exist for standard dataframes. I believe that there has been some recovery lately, but don't know much as I don't really use it. Surely someone else can chip in here.

FooBar
  • 15,724
  • 19
  • 82
  • 171
  • Hi thanks for your reply. The moethod you describe is averaging everything, even the coordinates. The coordinates (nodes of the computational grid) should remain as they are. it's only the velocity value v that should be averaged accros the whole grid. I have prepared this set of sample data in excel [link](https://docs.google.com/spreadsheets/d/1La1yIDHMNmqp_alC4bSI5MdTCFczq-IbkeTcWYm70H8/edit?usp=sharing). – Rojj Jul 13 '14 at 16:28
  • Well this is supposed to be a starting point for your own coding only. You can define whatever lambda function you like. If you want to return mean values of a specific column only, try `lambda x: x['v'].mean()`. – FooBar Jul 13 '14 at 16:33
  • I ended up using a different approach. I first append the dataframe with the unknown velocities to the first dataframe with all the simulation data. I then cut this new dataframe by binning the wind directions. At this point I sort the table by wd,X,Y,Z. This leaves nice gaps for each grid node that I can simply fill by interpolation using the dataframe interpolation function. It works with a small sample data test. Will try in these days with the full set of data. – Rojj Jul 14 '14 at 03:49