1

I'm developing a set of graphs to paint some Pandas DataFrame values. For that I'm using various pandas, numpy and matplotlib modules and functions using the following code:

    import pandas as pd
    import numpy as np
    from matplotlib import pyplot as plt
    import matplotlib.ticker as ticker
    
    data = {'Name': ['immoControlCmd', 'BrkTerrMde', 'GlblClkYr', 'HsaStat', 'TesterPhysicalResGWM', 'FapLc','FirstRowBuckleDriver', 'GlblClkDay'],
            'Value': [0, 5, 0, 4, 0, 1, 1, 1],
            'Id_Par': [0, 0, 3, 3, 3, 3, 0, 0]
            }
    
    signals_df = pd.DataFrame(data)
    
    
    def plot_signals(signals_df):
        # Count signals by par
        signals_df['Count'] = signals_df.groupby('Id_Par').cumcount().add(1).mask(signals_df['Id_Par'].eq(0), 0)
        # Subtract Par values from the index column
        signals_df['Sub'] = signals_df.index - signals_df['Count']
        id_par_prev = signals_df['Id_Par'].unique()
        id_par = np.delete(id_par_prev, 0)
        signals_df['Prev'] = [1 if x in id_par else 0 for x in signals_df['Id_Par']]
        signals_df['Final'] = signals_df['Prev'] + signals_df['Sub']
        # signals_df['Finall'] = signals_df['Final'].unique()
        # print(signals_df['Finall'])
        # Convert and set Subtract to index
        signals_df.set_index('Final', inplace=True)
        # pos_x = len(signals_df.index.unique()) - 1
        # print(pos_x)
    
        # Get individual names and variables for the chart
        names_list = [name for name in signals_df['Name'].unique()]
        num_names_list = len(names_list)
        num_axis_x = len(signals_df["Name"])
    
        # Creation Graphics
        fig, ax = plt.subplots(nrows=num_names_list, figsize=(10, 10), sharex=True)
        plt.xticks(np.arange(0, num_axis_x), color='SteelBlue', fontweight='bold')
        for pos, (a_, name) in enumerate(zip(ax, names_list)):
            # Get data
            data = signals_df[signals_df["Name"] == name]["Value"]
            # Get values axis-x and axis-y
            x_ = np.hstack([-1, data.index.values, len(signals_df) - 1])
            # print(data.index.values)
            y_ = np.hstack([0, data.values, data.iloc[-1]])
            # Plotting the data by position
            ax[pos].plot(x_, y_, drawstyle='steps-post', marker='*', markersize=8, color='k', linewidth=2)
            ax[pos].set_ylabel(name, fontsize=8, fontweight='bold', color='SteelBlue', rotation=30, labelpad=35)
            ax[pos].yaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
            ax[pos].yaxis.set_tick_params(labelsize=6)
            ax[pos].grid(alpha=0.4, color='SteelBlue')
        plt.show()
    
    
    plot_signals(signals_df)

What I want is to remove the points or positions of the x-axis where nothing is painted or they are not marked on the graph, but leave the values ​​and names as in the image at the end; Seen from Pandas it would be the "Final" column that, before painting the subplots, assigned it as an index and it is where some of the values ​​in this column are repeated; would be to remove the values ​​enclosed in the red box from the graph, but leave the values ​​and names as in the image at the end:

                            Name  Value  Id_Par  Count  Sub  Prev
     Final                                                       
     0            immoControlCmd      0       0      0    0     0
     1                BrkTerrMde      5       0      0    1     0
     2                 GlblClkYr      0       3      1    1     1
     2                   HsaStat      4       3      2    1     1
     2      TesterPhysicalResGWM      0       3      3    1     1
     2                     FapLc      1       3      4    1     1
     6      FirstRowBuckleDriver      1       0      0    6     0
     7                GlblClkDay      1       0      0    7     0

actual chart

I've been trying to bring the unique values ​​of the last column, which would be the value that the x-axis should be, but since the dataframe is of another size or dimension, I get an error: ValueError: Length of values ​​(5) does not match length of index (8), and then I have to resize my chart, but in this case I don't understand how to do it:

        signals_df['Final'] = signals_df['Prev'] + signals_df['Sub']
        signals_df['Finall'] = signals_df['Final'].unique()
        print(signals_df['Finall'])

I've also tried to bring the size of the unique index, previously assigned to apply a subtraction to data.index.values ​​of the variable x_, but it does not bring me what I want because it is gathering all the values ​​and subtracting them in bulk and not separately , as is data.index.values:

    signals_df.set_index('Final', inplace=True)
    pos_x = len(signals_df.index.unique()) - 1
    ...
    ..
    .
         x_ = np.hstack([-1, data.index.values-pos-x, len(signals_df) - 1])

Is there a Pandas and/or Matplotlib function that allows me? or Could someone give me a suggestion that will help me better understand how to do it? what I expect to achieve would be the plot below:

expected chart

I really appreciate your help, any comments help. I've Python version: 3.6.5, Pandas version: 1.1.5 and Matplotlib version: 3.3.2

MayEncoding
  • 87
  • 1
  • 12
  • if you made your `x_` values into strings, you could plot them as categorical data, as they do in this example: https://matplotlib.org/stable/gallery/lines_bars_and_markers/categorical_variables.html#sphx-glr-gallery-lines-bars-and-markers-categorical-variables-py – tmdavison Aug 24 '21 at 21:10
  • Thanks @tmdavision, but in the example they separate the subplots into 2 lines and what I want is that it be the same line of steps, minus the values ​​that are empty, using pandas and/or if possible matplotlib. And it is required to disappear or delete the x-axis values ​​that have no marks on the plot. – MayEncoding Aug 24 '21 at 22:14
  • You can do that with catagorical plotting, see my answer below – tmdavison Aug 25 '21 at 08:49

1 Answers1

2

One possible way to do this is if you make your x-axis values into strings, which means that matplotlib will make a "categorical" plot. See examples of that here.

For your case, because you have subplots which would have different values, and they are not always in the right order, we need to do a bit of trickery first to make sure the ticks appear in the correct order. For that, we can use the approach from this answer, where they plot something that uses all of the x values in the correct order, and then remove it.

To gather all the xtick values together, you can do something like this, where you create a list of the values, reduce it to the unique values using a set, then sort those values, and convert to strings using a list comprehension and str():

# First make a list of all the xticks we want
xvals = [-1,]
for name in names_list:
    xvals.append(signals_df[signals_df["Name"] == name]["Value"].index.values[0])
xvals.append(len(signals_df)-1)

# Reduce to only unique values, sorted, and then convert to strings
xvals = [str(i) for i in sorted(set(xvals))]

Once you have those, you can make a dummy plot, and then remove it, like so (this is to fix the tick positions in the correct order). NOTE that this needs to be inside your plotting loop for matplotlib versions 3.3.4 and earlier:

# To get the ticks in the right order on all subplots, we need to make
# a dummy plot here and then remove it
dummy, = ax[0].plot(xvals, np.zeros_like(xvals))
dummy.remove()

Finally, when you actually plot the real data inside the loop, you just need to convert x_ to strings as you plot them:

ax[pos].plot(x_.astype('str'), y_, drawstyle='steps-post', marker='*', markersize=8, color='k', linewidth=2)

Note the only other change I made was to not explicitly set the xtick positions (which you did, with plt.xticks), but you can still use that command to set the font colour and weight

plt.xticks(color='SteelBlue', fontweight='bold')

And this is the output:

enter image description here

For completeness, here I have put it all together in your script:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker

import matplotlib
print(matplotlib.__version__)

data = {'Name': ['immoControlCmd', 'BrkTerrMde', 'GlblClkYr', 'HsaStat', 'TesterPhysicalResGWM', 'FapLc',
                 'FirstRowBuckleDriver', 'GlblClkDay'],
        'Value': [0, 5, 0, 4, 0, 1, 1, 1],
        'Id_Par': [0, 0, 3, 3, 3, 3, 0, 0]
        }

signals_df = pd.DataFrame(data)


def plot_signals(signals_df):
    # Count signals by par
    signals_df['Count'] = signals_df.groupby('Id_Par').cumcount().add(1).mask(signals_df['Id_Par'].eq(0), 0)
    # Subtract Par values from the index column
    signals_df['Sub'] = signals_df.index - signals_df['Count']
    id_par_prev = signals_df['Id_Par'].unique()
    id_par = np.delete(id_par_prev, 0)
    signals_df['Prev'] = [1 if x in id_par else 0 for x in signals_df['Id_Par']]
    signals_df['Final'] = signals_df['Prev'] + signals_df['Sub']
    # signals_df['Finall'] = signals_df['Final'].unique()
    # print(signals_df['Finall'])
    # Convert and set Subtract to index
    signals_df.set_index('Final', inplace=True)
    # pos_x = len(signals_df.index.unique()) - 1
    # print(pos_x)

    # Get individual names and variables for the chart
    names_list = [name for name in signals_df['Name'].unique()]
    num_names_list = len(names_list)
    num_axis_x = len(signals_df["Name"])

    # Creation Graphics
    fig, ax = plt.subplots(nrows=num_names_list, figsize=(10, 10), sharex=True)

    # No longer any need to define where the ticks go, but still set the colour and weight here
    plt.xticks(color='SteelBlue', fontweight='bold')

    # First make a list of all the xticks we want
    xvals = [-1, ]
    for name in names_list:
        xvals.append(signals_df[signals_df["Name"] == name]["Value"].index.values[0])
    xvals.append(len(signals_df) - 1)

    # Reduce to only unique values, sorted, and then convert to strings
    xvals = [str(i) for i in sorted(set(xvals))]

    for pos, (a_, name) in enumerate(zip(ax, names_list)):
    
        # To get the ticks in the right order on all subplots,
        # we need to make a dummy plot here and then remove it
        dummy, = ax[pos].plot(xvals, np.zeros_like(xvals))
        dummy.remove()
        # Get data
        data = signals_df[signals_df["Name"] == name]["Value"]
        # Get values axis-x and axis-y
        x_ = np.hstack([-1, data.index.values, len(signals_df) - 1])
        y_ = np.hstack([0, data.values, data.iloc[-1]])
        # Plotting the data by position
        # NOTE: here we convert x_ to strings as we plot, to make sure they are plotted as catagorical values
        ax[pos].plot(x_.astype('str'), y_, drawstyle='steps-post', marker='*', markersize=8, color='k', linewidth=2)
        ax[pos].set_ylabel(name, fontsize=8, fontweight='bold', color='SteelBlue', rotation=30, labelpad=35)
        ax[pos].yaxis.set_major_formatter(ticker.FormatStrFormatter('%0.1f'))
        ax[pos].yaxis.set_tick_params(labelsize=6)
        ax[pos].grid(alpha=0.4, color='SteelBlue')

    plt.show()


plot_signals(signals_df)
tmdavison
  • 64,360
  • 12
  • 187
  • 165
  • Ready, thanks @tmdavison. Only a question, What does this do `.index.values[0]` inside the line of the first for loop? – MayEncoding Aug 26 '21 at 16:57
  • `signals_df[signals_df["Name"] == name]["Value"].index.values` is basically what you use to make your `x_` array inside your plotting loop. So I used the same logic to make the xvals. The `[0]` is needed because without it, you get a 1-item numpy array, but we just want the value inside the array to make the xvals list. – tmdavison Aug 26 '21 at 17:05
  • Ok, ok... and one last question, why `np.zeros_like(x_values)` in the line: `dummy, = ax[pos].plot(x_values, np.zeros_like(x_values))`? – MayEncoding Aug 26 '21 at 18:01
  • Could have used anything there really, as long as it was the same length as `x_vals`. I just wanted to use zeros so the y-axis scale wasn't affected – tmdavison Aug 26 '21 at 19:22