2

Pandas dataframe have "user_fair , user_good, rating" these 3 columns.

I am using sns.swarmplot to plot "user_fair vs rating" and "user_good vs rating".

"user_fair vs rating" works fine but when try to plot "user_good vs rating" the code runs forever and does not print any plot. I am using Python3 and Jupyter Notebook.

This is the code i am using:

fig, ax = plt.subplots(figsize=(15, 15))
ax = sns.swarmplot(y="user_good", x="rating", data=data)
ax.set_xlabel("Rating",size = 20,alpha=0.8)
ax.set_ylabel("Goodness of User who got Rated",size = 20,alpha=0.8)
ax.set_title("Distributin of Rating and How are Goodness Score of ratee",size=20)
sentence
  • 8,213
  • 4
  • 31
  • 40
  • your dataframe data is going to be need to be shown in order to help you, Its impossible to help you with it. – Ben Pap May 02 '19 at 22:57
  • Here is the drive link to data.csv - https://drive.google.com/file/d/1hwEezT206pdFksUwalyYnMqaR82wX2Lp/view?usp=sharing – Ajinkya Rawankar May 03 '19 at 01:31

2 Answers2

6

So the issue isn't with your code, but how swarmplots are created. Swarmplots create points and make sure "points are adjusted (only along the categorical axis) so that they don’t overlap". When you have a lot of data and a lot of points overlap, it struggles, and the majority of the rating/user_good values overlap.

I highly recommend using a violin plot. It will give you the same information your trying to identify with a swarmplot, and will work considerably faster.

fig, ax = plt.subplots(figsize=(15, 15))
ax = sns.violinplot(x="rating", y="user_good",  data=df, cut = 0)
ax.set_xlabel("Rating",size = 20,alpha=0.8)
ax.set_ylabel("Goodness of User who got Rated",size = 20,alpha=0.8)
ax.set_title("Distributin of Rating and How are Goodness Score of ratee",size=20)

enter image description here

enter image description here

Ben Pap
  • 2,549
  • 1
  • 8
  • 17
0

Try violin plot most of the time but when the data have more outliers you could also go for scatterplot.

fig, ax = plt.subplots(figsize=(15, 15))
ax = sns.scatterplot(y="user_good", x="rating", data=data)
ax.set_xlabel("Rating",size = 20,alpha=0.8)
ax.set_ylabel("Goodness of User who got Rated",size = 20,alpha=0.8)
ax.set_title("Distributin of Rating and How are Goodness Score of ratee",size=20)

ScatterPlot

Ritik Dua
  • 27
  • 7