3

I am a beginner in Python (using Python 3.7 in Spyder 3.3.2 and Anaconda Navigator 1.9.6). I have no problem creating seaborn violin plots, but the moment I try to Facetgrid them I run into issues. I tried using catplot.

Here is my violin plot code (it works):

# Libraries
import seaborn as sns
import pandas as pd
import os # Imports `os`
from matplotlib import pyplot as plt

os.chdir(r"XXXXXX") # Changes directory 
os.listdir('.') # Lists all files and directories in current directory


## Data set
File = 'test_eventcountratios.xlsx' # Assigns Excel filename to File
df = pd.read_excel(File)

ax = sns.violinplot(x = df["Timepoint"], y = df["Macrophage Frequency"], palette = "Blues")  
ax.set_xticklabels(ax.get_xticklabels(),rotation=30)

My data is long form, so all timepoints are in the first column and "Macrophage Frequency" data are in the second column. All remaining columns represent other cell types. Here is a screenshot of my data spreadsheet

Here is my catplot code (it doesn't work):

g=sns.catplot(data=df, x="Timepoint", y=df["B cell Frequency","Neutrophil Frequency","NK cell Frequency","Macrophage Frequency"],
              palette = "Blues",
              kind = "violin", split=True)

I get "Key Error: ('B cell Frequency', 'Neutrophil Frequency', 'NK cell Frequency', 'Macrophage Frequency')"

I don't even want to call on each column individually. I would like the code to run through each column (cell type) to gather data and put each column's data into it's own plot.

I stripped the catplot code to basics to see if that worked:

g=sns.catplot(x = df["Timepoint"], y = df["Macrophage Frequency"], palette = "Blues", data=df, kind="violin")

It works and produces a violin plot, but with this error: "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

So...

I want to make a grid of multiple violin plots (Timepoint on X axis, Cell type frequency on Y axis), where each plot takes data from each column. Why am I only successful when I limit my "y" to a single column from my dataframe?

I've Googled all of my errors, but I can't seem to make the right changes to my code. If I change one thing, then I get a new error (like "TypeError: object of type 'NonType' has no len()", "ValueError: num must be 1 <= num <= 0, not 1", etc)

arm
  • 31
  • 1
  • 2
  • I want to make a grid of multiple violin plots, using catplot. – arm Feb 12 '19 at 19:16
  • Ok, but seaborn works with long form dataframes; you have a wide-form dataframe (even though you call it differently). – ImportanceOfBeingErnest Feb 12 '19 at 19:28
  • How so? Based on this (https://www.theanalysisfactor.com/wide-and-long-data/), my data looks more like the long form spreadsheet example that they show. Please let me know if I am missing something. – arm Feb 12 '19 at 20:16
  • Based on your comment, I looked up other websites to figure out my misunderstanding. This website helped me understand your point (thank you!): https://medium.com/@andykashyap/how-to-convert-a-table-into-long-form-or-tidy-form-for-seaborn-visualizations-2bd8b44cdc29 – arm Feb 12 '19 at 20:29
  • That site isn't too bad. It says under *Long form*: *"In the long format, each row is one time point per subject."*. That is obviously not the case here. Your dataframe in long form would have three columns: timepoint, frequency type, value – ImportanceOfBeingErnest Feb 12 '19 at 20:29
  • Thanks I will try to convert the table to long form now. – arm Feb 12 '19 at 20:31
  • Maybe worth noting that sns.violinplot itself will work fine with wide form data, but if you want to create a catplot, you will need the different columns to denote the different levels (hue, rows, ..?) – ImportanceOfBeingErnest Feb 12 '19 at 20:38
  • Thank you so much! I converted a subset of the data (manually) into long form and it worked with this code: g=sns.catplot(x="Timepoint", y="Frequency", col="Cell Type", data=df, col_wrap = 2, palette = "Blues", kind = "violin", split=True) – arm Feb 12 '19 at 20:48
  • I think it's worth pointing out that the original error is caused by improperly selecting multiple columns in the dataframe. You need to use double square brackets like `df[["B cell Frequency","Neutrophil Frequency",...]]` instead of single brackets. This probably won't solve the rest of your issues, but that's why you got that KeyError. – m13op22 Aug 21 '19 at 14:21

1 Answers1

0

Use this:

g = sns.catplot(x = "Timepoint", y = "Macrophage Frequency", palette = "Blues", data=df, kind="violin")

x and y is simply the column name in df.

user2653663
  • 2,818
  • 1
  • 18
  • 22
Him Singhvi
  • 103
  • 3
  • 13