7

I know this is going to end up being a really messy plot, but I am curious to know what the most efficient way to do this is. I have some data that looks like this in a csv file:

    ROI          Band   Min         Max         Mean        Stdev
1   red_2        Band 1 0.032262    0.124425    0.078073    0.028031
2   red_2        Band 2 0.021072    0.064156    0.037923    0.012178
3   red_2        Band 3 0.013404    0.066043    0.036316    0.014787
4   red_2        Band 4 0.005162    0.055781    0.015526    0.013255
5   red_3        Band 1 0.037488    0.10783     0.057892    0.018964
6   red_3        Band 2 0.02814     0.07237     0.04534     0.014507
7   red_3        Band 3 0.01496     0.112973    0.032751    0.026575
8   red_3        Band 4 0.006566    0.029133    0.018201    0.006897
9   red_4        Band 1 0.022841    0.148666    0.065844    0.0336
10  red_4        Band 2 0.018651    0.175298    0.046383    0.042339
11  red_4        Band 3 0.012256    0.045111    0.024035    0.009711
12  red_4        Band 4 0.001493    0.033822    0.014678    0.007788
13  red_5        Band 1 0.030513    0.18098     0.090056    0.044456
37  bcs_1        Band 1 0.013059    0.076753    0.037674    0.023172
38  bcs_1        Band 2 0.035227    0.08826     0.057672    0.015005
39  bcs_1        Band 3 0.005223    0.028459    0.010836    0.006003
40  bcs_1        Band 4 0.009804    0.031457    0.018094    0.007136
41  bcs_2        Band 1 0.018134    0.083854    0.040654    0.018333
42  bcs_2        Band 2 0.016123    0.088613    0.045742    0.020168
43  bcs_2        Band 3 0.008065    0.030557    0.014596    0.007435
44  bcs_2        Band 4 0.004789    0.016514    0.009815    0.003241
45  bcs_3        Band 1 0.021092    0.077993    0.037246    0.013696
46  bcs_3        Band 2 0.011918    0.068825    0.028775    0.013758
47  bcs_3        Band 3 0.003969    0.021714    0.011336    0.004964
48  bcs_3        Band 4 0.003053    0.015763    0.006283    0.002425
49  bcs_4        Band 1 0.024466    0.079989    0.049291    0.018032
50  bcs_4        Band 2 0.009274    0.093137    0.041979    0.019347
51  bcs_4        Band 3 0.006874    0.027214    0.014386    0.005386
52  bcs_4        Band 4 0.005679    0.026662    0.014529    0.006505

And I want to create one probability density plot with 8 lines: 4 of which the 4 bands for "red" and the other will be the 4 bands for "black".So far I have this for just Band 1 in both red and black ROIs. But my code outputs two different plots. I have tried using subplot but that has not worked for me.

Help? I know my approach is verbose and clunky, so smarter solutions much appreciated!

Load packages

import csv 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

files = ['example.csv']

Organize the data

for f in files:
    fn = f.split('.')[0]
    dat = pd.read_csv(f)
    df0 = dat.loc[:, ['ROI', 'Band', 'Mean']]
    # parse by soil type
    red = df0[df0['ROI'].str.contains("red")]
    black = df0[df0['ROI'].str.contains("bcs")]
    # parse by band 
    red.b1 = red[red['Band'].str.contains("Band 1")]
    red.b2 = red[red['Band'].str.contains("Band 2")]
    red.b3 = red[red['Band'].str.contains("Band 3")]
    red.b4 = red[red['Band'].str.contains("Band 4")]
    black.b1 = black[black['Band'].str.contains("Band 1")]
    black.b2 = black[black['Band'].str.contains("Band 2")]
    black.b3 = black[black['Band'].str.contains("Band 3")]
    black.b4 = black[black['Band'].str.contains("Band 4")]

Plot the figure

pd.DataFrame(black.b1).plot(kind="density")
pd.DataFrame(red.b1).plot(kind="density")
plt.show()

enter image description here

I'd like for the figure to have 8 lines on it.

JAG2024
  • 3,987
  • 7
  • 29
  • 58

2 Answers2

12

groupby + str.split

df.groupby([df.ROI.str.split('_').str[0], 'Band']).Mean.plot.kde();

enter image description here

If you want a legend

df.groupby([df.ROI.str.split('_').str[0], 'Band']).Mean.plot.kde()
plt.legend();

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • It seems nice, exaclty no new column is necessary. – jezrael Apr 18 '17 at 06:31
  • This is very clear. Thanks @piRSquared! Do you know how to change the color palette for the legend. For example, what if I want to use one of these: http://matplotlib.org/1.2.1/examples/pylab_examples/show_colormaps.html – JAG2024 Apr 18 '17 at 19:04
  • 1
    @JAG2024 pass the desired name into the `colormap` parameter. `df.groupby([df.ROI.str.split('_').str[0], 'Band']).Mean.plot.kde(colormap='Accent')` – piRSquared Apr 18 '17 at 19:08
  • 1
    @JAG2024 don't forget to accept the answer if this does answer your question. And if it was useful, please up-vote if haven't already. – piRSquared Apr 18 '17 at 19:09
  • @piRSquared one more thing... all of my lines are showing up as the same color. I'd like to use the colormap "RdGy" so that the "bcs" are more black and the "red" lines are more red. Is there an easy way to do this? I can edit the question above or post a new one to stackoverflow if that would help. – JAG2024 Apr 18 '17 at 19:39
  • @JAG2024 yeah, that isn't right. It seems to be resetting the colormap for each plot. I'll have to figure that out later. Or, it could be a great new SO question. – piRSquared Apr 18 '17 at 19:39
  • Here's the question: http://stackoverflow.com/questions/43480972/colormap-lines-showing-up-as-the-same-color @piRSquared – JAG2024 Apr 18 '17 at 19:49
8

Something to help lead you in the right direction:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame()
for i in range(8):
    mean = 5-10*np.random.rand()
    std = 6*np.random.rand()
    df['score_{0}'.format(i)] = np.random.normal(mean, std, 60)

fig, ax = plt.subplots(1,1)
for s in df.columns:
    df[s].plot(kind='density')
fig.show()

Basically just looping through the columns, and plotting as you go. Having more control over the figure is very helpful.

enter image description here

Robbie
  • 4,672
  • 1
  • 19
  • 24