3

I have to work with weighted probability distributions quite a bit and would like to use violinplots for some visualization. However I can not find a way to create these with weighted data in any of the usual suspects (matplotlib, seaborn, bokeh, etc.).

Does anyone know of an implementation or a possible workaround to allow the use of weighted data? Recreating an artificial unweighted distribution from the weighted data is not feasible because of the large size of my datasets. R has a wvioplot package but I would really like to stick with Python.

sllrp
  • 33
  • 6

1 Answers1

-1

Answer posted for reference :

import weighted
from matplotlib.cbook import violin_stats
from scipy import stats
import statsmodels.api as sm

def vdensity_with_weights(weights):
    ''' Outer function allows innder function access to weights. Matplotlib
    needs function to take in data and coords, so this seems like only way
    to 'pass' custom density function a set of weights '''

    def vdensity(data, coords):
        ''' Custom matplotlib weighted violin stats function '''
        # Using weights from closure, get KDE fomr statsmodels
        weighted_cost = sm.nonparametric.KDEUnivariate(data)
        weighted_cost.fit(fft=False, weights=weights)

        # Return y-values for graph of KDE by evaluating on coords
        return weighted_cost.evaluate(coords)
    return vdensity

def custom_violin_stats(data, weights):
    # Get weighted median and mean (using weighted module for median)
    median = weighted.quantile_1D(data, weights, 0.5)
    mean, sumw = np.ma.average(data, weights=list(weights), returned=True)

    # Use matplotlib violin_stats, which expects a function that takes in data and coords
    # which we get from closure above
    results = violin_stats(data, vdensity_with_weights(weights))

    # Update result dictionary with our updated info
    results[0][u"mean"] = mean
    results[0][u"median"] = median

    # No need to do this, since it should be populated from violin_stats
    # results[0][u"min"] =  np.min(data)
    # results[0][u"max"] =  np.max(data)

    return results

### Example
#vpstats1 = custom_violin_stats(np.asarray(df_column_data), np.asarray(df_column_weights))
#vplot = ax.violin(vpstats1, [pos_idx], vert=False, showmeans=True, showextrema=True, showmedians=True)
#current_color_palette = ...
#for pc in vplot['bodies']:
#    pc.set_facecolor(current_color_palette[pos_idx])
#    pc.set_edgecolor('black')

This answer is from : here

Ani Menon
  • 27,209
  • 16
  • 105
  • 126
  • This code does not run. `weighted` is not a module from the standard distribution. Numpy seems to be used but not imported. – Ramon Crehuet May 10 '18 at 10:35
  • 1
    The weighted package is actually, and confusingly, the [wquantiles](https://pypi.org/project/wquantiles/) package.. For unknown reasons, that's imported with `import weighted`. – JonathanU May 24 '18 at 15:29