0

I'm uploading a spreadsheet from excel to a dataframe. In this table, I am only interested in two columns. The first column is the date and time in the format %Y-%m-%d %H-%M-%S. The second column is a categorical variable, namely the type of violation (for example, being late). There are few types of violations in total. About 6-7 types. Using the command df.info () you can make sure that the dataframe for the available columns has the datatime64[ns] type for the date and time column and the category type for the column with the types of violations. I would like to use hexbin plot with marginal distributions from the seaborn library (https://seaborn.pydata.org/examples/hexbin_marginals.html ). However, the simple code available at the link above is not so simple for a variable with categories and time.

import seaborn as sns
sns.set_theme(style="ticks")

sns.jointplot(x=df['incident'], y=['date-time'], kind="hex", color="#4CB391")

The compiler reports TypeError: The x variable is categorical, but one of ['numeric', 'datetime'] is required

I understand that either a numeric variable or a date-time variable is needed for the ordinate axis. Conversion does not solve the problem.

This error can be reproduced using

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

from datetime import datetime


ndf = pd.DataFrame({'date-time': ['2021-11-15 00:10:00','2021-11-15 00:20:00'], 'incident': ['a','b']})
print(ndf)

sns.set_theme(style="ticks")

sns.jointplot(data=ndf, x='incident', y='date-time', color="#4CB391", hue=ndf['incident'] )

plt.show()

Question. How to get a plot looks like seabron style

Valentin
  • 1
  • 1
  • This kind of plot doesn't make much sense for purely categorical data. If there is some "natural" mapping from categories to numbers, you could try something like `df['incident']=df['incident'].map({'a': 1, 'b': 3, 'c':2,...})`. – JohanC Nov 20 '21 at 13:10

1 Answers1

0

Based on the example cited in the question as the desired graph style, change the data on the x-axis to date/time data, convert it to a date format that can be handled by matplotlib, and place it as a tick on the x-axis. The ticks are placed by converting the time series back to its original format. Since the date and time overlap, the angle of the text is changed. Also, please judge for yourself the points that @JohanC pointed out.

import numpy as np
import seaborn as sns
sns.set_theme(style="ticks")
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

rs = np.random.RandomState(11)
x = rs.gamma(2, size=1000)
date_rnge = pd.date_range('2021-11-15', '2021-11-16', freq='1min')
y = -.5 * x + rs.normal(size=1000)

g = sns.jointplot(x=mdates.date2num(date_rng[:1000]), y=y, kind="hex", color="#4CB391")
g.ax_joint.set_xticklabels([mdates.num2date(d).strftime('%Y-%m-%d %H:%M:%S') for d in mdates.date2num(date_rng[:1000])])

for tick in g.ax_joint.get_xticklabels():
    tick.set_rotation(45)

enter image description here

r-beginners
  • 31,170
  • 3
  • 14
  • 32