2

I have:

import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt

# Generate random data
set1 = np.random.randint(0, 40, 24)
set2 = np.random.randint(0, 100, 24)

# Put into dataframe and plot
df = pd.DataFrame({'set1': set1, 'set2': set2})
data = pd.melt(df)
sb.swarmplot(data=data, x='variable', y='value')

The two random distributions plotted with seaborn's swarmplot function: The two random distributions plotted with seaborns swarmplot function

I want the individual plots of both distributions to be connected with a colored line such that the first data point of set 1 in the dataframe is connected with the first data point of set 2. I realize that this would probably be relatively simple without seaborn but I want to keep the feature that the individual data points do not overlap. Is there any way to access the individual plot coordinates in the seaborn swarmfunction?

Marcus Campbell
  • 2,746
  • 4
  • 22
  • 36
MichlF
  • 139
  • 1
  • 8

2 Answers2

5

EDIT: Thanks to @Mead, who pointed out a bug in my post prior to 2021-08-23 (I forgot to sort the locations in the prior version).

I gave the nice answer by Paul Brodersen a try, and despite him saying that

Madness lies this way

... I actually think it's pretty straight forward and yields nice results:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Generate random data
rng = np.random.default_rng(42)
set1 = rng.integers(0, 40, 5)
set2 = rng.integers(0, 100, 5)

# Put into dataframe
df = pd.DataFrame({"set1": set1, "set2": set2})
print(df)
data = pd.melt(df)

# Plot
fig, ax = plt.subplots()
sns.swarmplot(data=data, x="variable", y="value", ax=ax)

# Now connect the dots
# Find idx0 and idx1 by inspecting the elements return from ax.get_children()
# ... or find a way to automate it
idx0 = 0
idx1 = 1
locs1 = ax.get_children()[idx0].get_offsets()
locs2 = ax.get_children()[idx1].get_offsets()

# before plotting, we need to sort so that the data points
# correspond to each other as they did in "set1" and "set2"
sort_idxs1 = np.argsort(set1)
sort_idxs2 = np.argsort(set2)

# revert "ascending sort" through sort_idxs2.argsort(),
# and then sort into order corresponding with set1
locs2_sorted = locs2[sort_idxs2.argsort()][sort_idxs1]

for i in range(locs1.shape[0]):
    x = [locs1[i, 0], locs2_sorted[i, 0]]
    y = [locs1[i, 1], locs2_sorted[i, 1]]
    ax.plot(x, y, color="black", alpha=0.1)

It prints:

   set1  set2
0     3    85
1    30     8
2    26    69
3    17    20
4    17     9

And you can see that the data is linked correspondingly in the plot.

enter image description here

S.A.
  • 1,819
  • 1
  • 24
  • 39
  • 1
    Actually, I didnt really test that but it seems very straightforward. Thanks for this. You're only missing `import seaborn as sns` and can delete one of the `import numpy as np`. – MichlF Jul 31 '20 at 11:09
  • 1
    thanks, I fixed it! --> yes, I now also used this method for a 2x2 plot (so lots of more dots) and it worked equally straight forward. The only "difficult" part is finding which of the ax's "children" are the objects to use. – S.A. Jul 31 '20 at 15:40
  • 2
    Isn't this solution just joining the top-most point of `set1` with the top-most point of `set2`? I don't think it's successfully joining points that occupy the same row in the original data frame (which are random, and therefore I'd expect the lines to be jumbled). – Mead Aug 18 '21 at 16:59
  • I think you are right @Mead ... I'll look into this and see if it can be easily solved with a sorting before plotting. – S.A. Aug 23 '21 at 07:54
  • 2
    It's fixed now. Thanks a ton for checking it and making me/us aware via a comment. – S.A. Aug 23 '21 at 12:14
1

Sure, it's possible (but you really don't want to).

seaborn.swarmplot returns the axis instance (here: ax). You can grab the children ax.get_children to get all plot elements. You will see that for each set of points there is an element of type PathCollection. You can determine the x, y coordinates by using the PathCollection.get_offsets() method.

I do not suggest you do this! Madness lies this way.

I suggest you have a look at the source code (found here), and derive your own _PairedSwarmPlotter from _SwarmPlotter and change the draw_swarmplot method to your needs.

Paul Brodersen
  • 11,221
  • 21
  • 38