0

I want to plot the count of all the features in my dataframe against the target variable to do some descriptive analysis on the data. What is the easiest and fastest way to do this? Also, can there be a way to handle both numeric data and categorical data at the same time? I am currently experimenting on the Titanic dataset on Kaggle.

I tried the following code but it did not show the count:

sns.pairplot(data=df, diag_kws={'element': 'step', 'histtype': 'step'}, kind='hist', x_vars=['Pclass', 'Sex','Age','SibSp','Parch','Fare','Embarked'], y_vars = ['Survived'])

and also this but the age feature was a mess and I couldn't get the labels right:

# Define the target variable
target_variable = df['Survived']

# Get a list of all other variables
variable_list = ['Pclass', 'Sex','Age','SibSp','Parch','Fare','Embarked']
for i in variable_list:
    m = sns.catplot( x = i, data = df[variable_list], kind = "count", legend = True )
    # Adding Labels to the bars
    ax = m.facet_axis(0,0)
    for p in ax.patches:
        ax.text(p.get_x() - 0.01, 
            p.get_height() * 1.02, 
           '{0:.1f}K'.format((p.get_height()/1000)),   #Used to format it K representation
            color='black', 
            rotation='horizontal', 
            size='large')
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
AbdullahQ
  • 11
  • 4

0 Answers0