0

I have the following code, which almost does what I need it to do. I am graphing the importance of each feature for two different models on the same graph for comparison. I can't seem to get them to show side by side as two separate bars. I am fairly new to python and brand new to this forum. here is the code:

def plot_importances1(model1, feature_names1, label1, model2=None,feature_names2=None, label2=None):
    if model2 is None:
        importances1 = model1.feature_importances_
        indices1 = np.argsort(importances1)

        plt.figure(figsize=(8, 8))  # Set figure size

        # plot the first list of feature importances as a horizontal bar chart
        plt.barh(range(len(indices1)), importances1[indices1], color="violet", align="center", label=label1)

        # set the y-axis tick labels to be the feature names
        plt.yticks(range(len(indices1)), [feature_names1[i] for i in indices1])
    else:
        importances1 = model1.feature_importances_
        indices1 = np.argsort(importances1)

        importances2 = model2.feature_importances_
        indices2 = np.argsort(importances2)

        plt.figure(figsize=(8, 8))  # Set figure size

        # plot the first list of feature importances as a horizontal bar chart
        plt.barh(range(len(indices1)), importances1[indices1], color="violet", align="center", label=label1)

        # plot the second list of feature importances as a horizontal bar chart
        plt.barh(range(len(indices2)), importances2[indices2], color="orange", align="center", label=label2)
        # set the y-axis tick labels to be the feature names
        plt.yticks(range(len(indices1)), [feature_names1[i] for i in indices1])

        # add a title and x- and y-axis labels
        plt.title("Feature Importances")
        plt.xlabel("Relative Importance")
        plt.ylabel("Feature")

        # add a legend to the plot
        plt.legend()

        # set the tick locations and labels for the first bar graph
        plt.gca().tick_params(axis='x', which='both', length=0)
        plt.gca().xaxis.set_ticks_position('top')
        plt.gca().xaxis.set_label_position('top')

        # set the tick locations and labels for the second bar graph
        plt.twinx()
        plt.gca().tick_params(axis='x', which='both', length=0)
        plt.gca().xaxis.set_ticks_position('bottom')
        plt.gca().xaxis.set_label_position('bottom')

        plt.show()

Then I call the function:

plot_importances1(
    dTree_treat_out,
    list(X1_train),
    "Outliers present",
    dTree,
    list(X_train),
    "No outliers",
)

The two bars are both showing, but I can't get them to separate completely and I am getting this error: Output for the code

I have ran several version of this, including one that does not return the matplotlib error. The problem with the other function definitions that I have is that the bars are stacked and I can't see both of them. If I knew how to make one less opaque? I am super stuck. I so not want them stacked, I need the first one to be its own graph with the second one NEXT to it, not overlaying or stacked on top, similar to the image I uploaded, but the bars need to be completely separated.

Any input to fix this issue will be greatly appreciated.

Happyg
  • 1
  • 2
  • By far the easiest way would be to use seaborn's `sns.barplot(..., orient='h', dodge=True)`. See e.g. [barplot from lists](https://stackoverflow.com/questions/53561766/seaborn-barplot-from-lists-instead-of-dataframes). Please note that questions without reproducible test data are harder to answer. – JohanC Jan 05 '23 at 20:09
  • I tried adding the 'dodge=True' to the barh plot I have that sort of works, and I am getting the error AttributeError: 'Rectangle' object has no property 'dodge' I guess I need to rewrite it using sns.barplot to make it work, but the evolution of the code led me to the barh solution. Is there a way to use what I already have and somehow modify it to separate the bars? Or is this as good as it gets? – Happyg Jan 11 '23 at 01:20
  • You gave me an idea to add "height" instead of "dodge". I can use this: plt.barh(np.arange(len(indices1)) + 0.4, importances2[indices2], height=0.4, color="orange", align="center", label=label2) to get what I want! Thank you for the kick start! It works! – Happyg Jan 11 '23 at 01:34
  • Well, `dodge` only works in seaborn. Usually, the seaborn functions work in a similar way as the underlying matplotlib functions, but with much less boilerplate. With `plt.barh` you need to manually calculate appropriate heights and y-offsets, and to make your code different depending on the number of features shown. An advantage of using `plt.barh` would be to have more detailed control (e.g. hatching some bars). – JohanC Jan 11 '23 at 06:34

0 Answers0