Plotting classification results in different dates?

Question

I have a data frame (my_data) as follows:

       0      2017-01  2017-02  2017-03  2017-04
 0     S1        2        3        2        2
 1     S2        2        0        2        0
 2     S3        1        0        2        2
 3     S4        3        2        2        2
 4     …         …        …        …        …
 5     …         …        …        …        …
 6     S10       2        2        3        2

This data frame is a result of a classification problem in different dates for each sample (S1,.., S10). In order to simplify the plotting I converted the confusion matrix in different numbers as follows: 0 means ‘TP’, 1 means ‘FP’, 2 refers to ‘TN’ and 3 points to ‘FN’. Now, I want to plot this data frame like the below image.

It needs to be mentioned that I already asked this question, but nobody could help me. So, now I tried to make the question more easy to understand that I can get help.

You may try to "import matplotlib" which is strong in visualizing data and charting — Aqueous Carlos, Feb 11 '19 at 08:49

Freya W · Accepted Answer · 2019-02-11T10:55:13.393

Unfortunately, I don't know of a way to plot one set of data with different markers, so you will have to plot over all your data separately.

You can use matplotlib to plot your data. I'm not sure how your data looks, but for a file with these contents:

2017-01,2017-02,2017-03,2017-04
2,3,2,2
2,0,2,0
1,0,2,2
3,2,2,2
2,2,3,2

You can use the following code to get the plot you want:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

df = pd.read_csv('dataframe.txt', parse_dates = True)
dates = list(df.columns.values) #get dates
number_of_dates = len(dates)
markers = ["o", "d", "^", "s"] #set marker shape
colors = ["g", "r", "m", "y"] #set marker color

# loop over the data in your dataframe
for i in range(df.shape[0]):
     # get a row of 1s, 2s, ... as you want your
     # data S1, S2, in one line on top of each other
    dataY = (i+1)*np.ones(number_of_dates)

    # get the data that will specify which marker to use
    data = df.loc[i]

    # plot dashed line first, setting it underneath markers with zorder
    plt.plot(dates, dataY, c="k", linewidth=1, dashes=[6, 2], zorder=1)

    # loop over each data point x is the date, y a constant number,
    # and data specifies which marker to use
    for _x, _y, _data in zip(dates, dataY, data):
        plt.scatter(_x, _y, marker=markers[_data], c=colors[_data], s=100, edgecolors="k", linewidths=0.5, zorder=2)

# label your ticks S1, S2, ...
ticklist = list(range(1,df.shape[0]+1))
l2 = [("S%s" % x) for x in ticklist]
ax.set_yticks(ticklist)
ax.set_yticklabels(l2)

labels = ["TP","TN","FP","FN"]
legend_elements = []
for l,c, m in zip(labels, colors, markers):
    legend_elements.append(Line2D([0], [0], marker=m, color="w", label=l, markerfacecolor=c, markeredgecolor = "k", markersize=10))

ax.legend(handles=legend_elements, loc='upper right')

plt.show()

Plotting idea from this answer.

This results in a plot looking like this:

EDIT Added dashed line and outline for markers to look more like example in question.

EDIT2 Added legend.

Plotting classification results in different dates?

1 Answers1