0

I have trouble using plotnine: I can't make graphic with 3 classes in(separated by color).

import pandas as pd
import numpy as np

from plotnine import *

path = '/home/punkproger/workspace/MyWorkPython/TestWork/galaxy_identificator/data/train.csv'

df = pd.read_csv(path)

my_plot = ggplot(data=df[:30000], mapping=aes(x='ra', fill='class', color='class')) + geom_density( alpha=0.7)
print(my_plot)

There is new 'class'(0-2) in each 10k samples.

Result will be:

this graph

But If I change number of samples to 10k(there is only 1 class):

import pandas as pd
import numpy as np

from plotnine import *

path = '/home/punkproger/workspace/MyWorkPython/TestWork/galaxy_identificator/data/train.csv'

df = pd.read_csv(path)

my_plot = ggplot(data=df[:10000], mapping=aes(x='ra', fill='class', color='class')) + geom_density( alpha=0.7)
print(my_plot)

Result is:

this graph

Now this one has tittle of class and color. I want to make 3 graphs in one plane, like:

this one

I am newbee at plotnine and don't see what is wrong. Spent a lot of time trying to google and to solve this problem.

Here you can download data : https://drive.google.com/file/d/1IMK1YtXG8Zl1lY8JJ12RtzDpHn65vQKi/view

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • Can't really replicate without some data but I think it has something to do with the type of your class column. In R class would have to be a factor (categorical), there is a category type in pandas, so maybe `df['class'] = df['class'].astype('category')` might help. – josemz Mar 14 '19 at 00:10
  • 1
    This doesn't help. – Vladislav Gusak Mar 14 '19 at 05:45
  • @josemz I have updated post, there is link to data in the end of text. – Vladislav Gusak Mar 14 '19 at 05:47
  • Are you sure? I just replicated the problem with simulated data and changing the column to category fixed it. – josemz Mar 14 '19 at 15:36
  • @josemz could you please share your results(with screenshot) and code in "answer" block? I copy-pasted line that you attached but there weren't changes. Maybe there is any troubles in libraries versions or something else. – Vladislav Gusak Mar 14 '19 at 15:49

1 Answers1

1

Sorry I can't download your data but here's the solution with simulated data.

import numpy as np
import pandas as pd
from plotnine import *

np.random.seed(0)

df = pd.DataFrame({'x': np.hstack((
                        np.random.normal(size=1000), 
                        np.random.normal(10, 2, size=1000), 
                        np.random.normal(-10, 2, size=1000))), 
                   'c': [0]*1000 + [1]*1000 + [2]*1000})

(ggplot(df, aes('x', color='c', fill='c')) + geom_density(alpha=0.7))

Yields this:

Taking the first 1,000 rows (corresponding to c == 0):

(ggplot(df[:1000], aes('x', color='c', fill='c')) + geom_density(alpha=0.7))

Now creating a categorical variable:

df['cat'] = df['c'].astype('category')
(ggplot(df, aes('x', color='cat', fill='cat')) + geom_density(alpha=0.7))

josemz
  • 1,283
  • 7
  • 15