2

The dataset comes from kaggle.

this code

melbourne_file_path = './melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
filtered_melbourne_data = melbourne_data.dropna(axis=0)
ax = filtered_melbourne_data.boxplot(column = 'Price', by = 'Regionname');

gives this boxplot

enter image description here

the boxplot already has lots info, such as median, is there a way to get them corresponding to the bys?

I tried this code adapted from this post

ax, bp = filtered_melbourne_data.boxplot(column = 'Price', by = 'Regionname', return_type='both');

and got this error

ValueError: not enough values to unpack (expected 2, got 1)

I also tried this code adapted from that post.

ax = filtered_melbourne_data.boxplot(column = 'Price', by = 'Regionname', return_type='both');
print(ax.median)

and got

<bound method Series.median of Price    (AxesSubplot(0.1,0.15;0.8x0.75), {'whiskers': ...
dtype: object>

how to get the values of medians of each Regionname?

  • It's not very straightforward to retrieve those values from the boxplot object. I would advise you extract them separately with `filtered_melbourne_data.groupby('Regionname').Price.agg('median')` – Nakor Oct 07 '19 at 05:03

1 Answers1

3

It is possible, but need some changes in solution from post:

First add ['Price'] for get values form one element Series:

ax, bp = filtered_melbourne_data.boxplot(column = 'Price', 
                                         by = 'Regionname', 
                                         return_type='both')['Price']

And then get first value of arrays by indexing - [0]:

medians = [median.get_ydata()[0] for median in bp["medians"]]
print (medians)
[990000.0, 670000.0, 715000.0, 590000.0, 780000.0, 1230000.0, 700000.0, 400000.0]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252