0

I'm trying to plot several subplots t analyse average call duration, per date, per agent. I read that info from a SQL table and load into a Panda Dataframe. Not all the agents share the same nr of days, or even the same dates, so shared x=True doesn't make sense.

I've come up with this:

    import pandas as pd
    from pandas import DataFrame
    import matplotlib.pyplot as plt 

    df2= df.groupby(['agent_id', 'call_date'])['duration_minutes'].mean()
    #Figure out number of rows needed for 2 column grid plot
    #Also accounts for odd number of plots
    group_len = len(df2.groupby('agent_id'))
    #nrows = int(math.ceil(group_len/2.))

    #Setup Subplots
    fig, axs = plt.subplots(group_len,1,sharex=False, sharey=True)

    for i,var in enumerate(df2.groupby('agent_id')):
        agent_id = var[0]
        #print(df2[agent_id])
        df2[agent_id].plot(x ='call_date', y='duration_minutes',
                           kind = 'line',legend=False, ax=axs[i],marker='*')
        axs[i].tick_params(axis='both', which='both', labelsize=7)
        axs[i].legend(['Agent Id: ' + str(agent_id)])
        #axs[i].set_title('Agent Id: ' + str(i),fontsize=8)
        #axs[i].yaxis.set_ticks_position('none')
        axs[i].set_xlabel('Day')
        #axs[i].set_ylabel('Agent Id: ' + str(i),fontsize=8)

    #plt.xticks(rotation=90)
    plt.suptitle('Avg call duration per day, per agent', verticalalignment='bottom', fontsize=12) 
    plt.tight_layout()

    #df1.plot(x ='agent_id', y='duration_minutes', kind = 'bar', title='Avg Call duration per agent')
    plt.show()  

    df2= df.groupby(['agent_id', 'call_date'])['duration_minutes'].mean()
    #Figure out number of rows needed for 2 column grid plot
    #Also accounts for odd number of plots
    group_len = len(df2.groupby('agent_id'))
    #nrows = int(math.ceil(group_len/2.))

    #Setup Subplots
    fig, axs = plt.subplots(group_len,1,sharex=False, sharey=True)

    for i,var in enumerate(df2.groupby('agent_id')):
        agent_id = var[0]
        #print(df2[agent_id])
        df2[agent_id].plot(x ='call_date', y='duration_minutes',
                           kind = 'line',legend=False, ax=axs[i],marker='*')
        axs[i].tick_params(axis='both', which='both', labelsize=7)
        axs[i].legend(['Agent Id: ' + str(agent_id)])
        #axs[i].set_title('Agent Id: ' + str(i),fontsize=8)
        #axs[i].yaxis.set_ticks_position('none')
        axs[i].set_xlabel('Day')
        #axs[i].set_ylabel('Agent Id: ' + str(i),fontsize=8)

    #plt.xticks(rotation=90)
    plt.suptitle('Avg call duration per day, per agent', verticalalignment='bottom', fontsize=12) 
    plt.tight_layout()

    #df1.plot(x ='agent_id', y='duration_minutes', kind = 'bar', title='Avg Call duration per agent')
    plt.show()  

Which gives something like this: subplots_output

I would like to improve this output, but I tried a lot of things, sometimes without luck. I want to be able to use Panda's dataframe, so I narrowed my research to Cufflinks, which I'm using right now. I came up with this solution, but I would like to have legend=agent_id and one color, per graph, if possible.

import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
#import seaborn as sns
#Cufflinks is a 3rd wrapper library around Plotly, inspired by the Pandas .plot() API.
import cufflinks as cf
from plotly.offline import iplot

 df = pd.DataFrame(SQL_Query,columns=['id','agent_id','duration_minutes','call_date','inbound'])
 # 2) Get the avg of duration per agent, per day
 df2= df.groupby(['agent_id', 'call_date'])['duration_minutes'].mean()

                fig_array = []
    for i,var in df2.groupby('agent_id'):
        #print(var)
        agent_id= var[0]
        #print('--------------------------------------------------')
        fig = var.reset_index().iplot(theme='pearl',asFigure=True
                ,x ='call_date', y='duration_minutes',
                kind = 'line',
                xTitle='', yTitle='Duration (min)',
                title=str(agent_id),
                world_readable=True)
        fig.update_layout(showlegend=False)
        fig.update_traces(texttemplate='%{y:.2f}',
                hovertemplate='<b>Day: </b>%{x} <br><b>Avg duration(min): </b>%{y}')
        fig_array.append(fig)     

    fig = cf.subplots(fig_array,shape=(group_len,1))
    #iplot(fig)
    plot(fig, filename='avg_duration_per_day_per_agent.html')

The CSV-file (I read from sql table, but it's the same) is something like this: id,agentid,duration,date,inbound

1,3,10.52,2019/05/01,true
2,1,12.93,2019/04/06,false
3,2,10.32,2019/06/14,true
4,3,8.84,2019/06/13,false
5,3,13.43,2019/05/06,false
6,3,4.78,2019/05/04,false
7,1,9.21,2019/06/21,true
8,5,9,2019/05/26,true
9,5,12.49,2019/06/04,true
10,3,3.68,2019/05/05,false
11,2,6.06,2019/06/22,false
12,4,7.66,2019/06/20,false
13,2,6.17,2019/06/15,true
14,4,13.6,2019/06/26,true
...

avg_in_cufflinks

I would like to show a more intuitive/pretty graphs, but I'm stuck with customizing these multichart, since I'm unable to hide legend, titles, etc for subplots. I manage to it perfectly for just one graph though. How can I put legend= agent_id and one single title, and titles for each x-axis and y-axis? Is not working.

neverMind
  • 1,757
  • 4
  • 29
  • 41
  • Do you have a specific question? This is pretty open ended... – Michael Delgado May 18 '20 at 00:09
  • Yes, sorry. I was trying to use Cufflinks, because the idea is to use Panda dataframe and so with that library I think I can. The problem is when I try: df1= df.groupby('agent_id')['duration_minutes'].mean() figure= df1.iplot(asFigure=True,x ='agent_id', y='duration_minutes', kind = 'bar') It says: "None of ['agent_id'] are in the columns" It was supposed to work more or less the same with the same df,right? – neverMind May 18 '20 at 08:39
  • Oh ok - this is a totally different question with nothing to do with plotting. Can you rewrite your question to include [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example)? Thanks! – Michael Delgado May 18 '20 at 15:35
  • Just looking quickly at your code I'm guessing that you need to call `reset_index(drop=False)` after your groupby.mean() – Michael Delgado May 18 '20 at 15:37

0 Answers0