2

I have a panda dataframe. I am making scatter plot and tried to categorize the data based on colorbar. I did it for monthly classification and quality classification as shown in the example code below.

a = np.random.rand(366)
b = np.random.rand(366)*0.4
index = (pd.date_range(pd.to_datetime('01-01-2000'), periods=366))
df = pd.DataFrame({'a':a,'b':b},index = index)
plt.scatter(df['a'],df['b'],c = df.index.month)
plt.colorbar()

enter image description here

And also for quality:

plt.scatter(df['a'],df['b'],c = df.index.quarter)
plt.colorbar()

enter image description here

My question: is there any way to categorize by half yearly. for example from the month 1-6 and 7-12 and also by month like: 10-3 and 4-9 Thank you and your help/suggestion will be highly appreciated.

Serenity
  • 35,289
  • 20
  • 120
  • 115
bikuser
  • 2,013
  • 4
  • 33
  • 57

2 Answers2

2

Make a custom function to put in scatter function to color argument. I made an example for half yearly division. You can use it as template for your own split function:

import numpy as np
import pandas as pd
import matplotlib.pylab as plt

# if month is 1 to 6 then the first halfyear else the second halfyear 
def halfyear(m):
    return 0 if (m <= 6) else 1
# vectorize function to use with Series
hy = np.vectorize(halfyear)

a = np.random.rand(366)
b = np.random.rand(366)*0.4
index = (pd.date_range(pd.to_datetime('01-01-2000'), periods=366))
df = pd.DataFrame({'a':a,'b':b},index = index)

# apply custom function 'hy' for 'c' argument
plt.scatter(df['a'],df['b'], c = hy(df.index.month))
plt.colorbar()

plt.show()

enter image description here

Another way to use lambda function like:

plt.scatter(df['a'],df['b'], \
 c = df.index.map(lambda m: 0 if (m.month > 0 and m.month < 7) else 1))
Serenity
  • 35,289
  • 20
  • 120
  • 115
2

I would opt for a solution which does not completely truncate the monthly information. Using colors which are similar but distinguishable for the months allows to visually classify by half-year as well as month.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors

a = np.random.rand(366)
b = np.random.rand(366)*0.4
index = (pd.date_range(pd.to_datetime('01-01-2000'), periods=366))
df = pd.DataFrame({'a':a,'b':b},index = index)

colors=["crimson", "orange", "darkblue", "skyblue"]
cdic = list(zip([0,.499,.5,1],colors))
cmap = matplotlib.colors.LinearSegmentedColormap.from_list("name", cdic,12 )
norm = matplotlib.colors.BoundaryNorm(np.arange(13)+.5,12)

plt.scatter(df['a'],df['b'],c = df.index.month, cmap=cmap, norm=norm)
plt.colorbar(ticks=np.arange(1,13))

plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712