Populate subplots with histograms within for loop

Question

I have a set of 7 *xls files (until now) in a folder, those are statistics for each month, with Pandas I read all of them, then I select the column called "Gap" and made some filtering to keep working. I tried to plot histograms of each "Gap" in one subplot (a Gap per month inside a subplot). The code I did is:

pato = r'D:\Inves\Pdoc\Cata_2021'
#os.chdir(pato)

file_list = glob.glob(pato + r"\*.xls")
print('file_list {}'.format(file_list))
print(file_list)



fig, axs = plt.subplots(4,3, figsize=(15, 6), sharex=True, sharey=True)

a = 4 
b = 3
# this for loop read all files with Pandas and 
for proce in file_list:
    df_cat = pd.read_excel(proce)
    #print(df_cat.head())
    df_cat_sha = df_cat[(df_cat.Prof.between(0, 75))]
    print(df_cat_sha)
    m = 0
   # Here I tried to create the subplot's and populate them
    for i in range(a):
        for j in range(b):
            df_cat_sha.Gap.hist(bins=18, ax=axs[i, j], 
                            color='green', alpha=0.75)
            m +=1
plt.show()

I got the following plot (with help of Plotting two histograms from a pandas DataFrame in one subplot using matplotlib)

However, you can see that in each subplot there are more than one histogram, which is that I do not wanted.

The desired output is like this example

Do you have any tip to plot just one histogram "Gap" from my DataFrame in one subplot?, here I attach the files in xls format too (https://drive.google.com/drive/folders/1MbuTMQpuc79nRYFGL-UAdmvo9r2AzhxA?usp=sharing)

Thanks in advance

Tonino

score 1 · Answer 1 · answered Sep 11 '21 at 02:58

1

Because you specified bins=18, each data frame is divided into 18 sub-ranges based on the min and max of its Gap column. When the min and max don't match between two data frames, you get misaligned bins.

Instead of specifying bins=18, you can explicitly define the edges of those 18 bins based on the global min and max of the Gap column in all data frames:

from pathlib import Path

pato = Path(r'D:\Inves\Pdoc\Cata_2021')
df_list = [pd.read_csv(file) for file in pato.glob('*.xls')]

tmp = pd.concat(df_list)
gap = tmp.loc[tmp['Prof'].between(0, 75), 'Gap']
bins = np.linspace(gap.min(), gap.max(), 18) # the bin edges

a, b = 4, 3
fig, axs = plt.subplots(a, b, figsize=(15, 6), sharex=True, sharey=True)

for df in df_list:
    for i in range(a):
        for j in range(b):
            cond = df['Prof'].between(0, 75)
            df.loc[cond, 'Gap'].hist(bins=bins, ax=axs[i,j], color='green', alpha=0.75)

answered Sep 11 '21 at 02:58

Code Different

90,614
16
144
163

thanks for the knowledge about bins, wonderful, however, I was not clear asking my doubt, Will be the change to plot like: ax[0,0,0] the gap histogram of the 01.xls and at ax[0,1,0] data from 02.xls, at ax[0,0,1] the 03.xls and so on, somehow like https://matplotlib.org/stable/gallery/scales/log_demo.html – tonino Sep 11 '21 at 13:34
I'm not sure I understand what you are asking. Can clarify it a little? – Code Different Sep 11 '21 at 14:18
Diffent, sorry for not being clear, I am trying to have 12 subplots in one figure, one subplot per month, I have 12 *xls files (one per month), I was trying to plot the GAP dataframe histogram, for example at ax[0,0,0].hist(df.loc[cond, 'Gap']...) will represent the first file, then the second file will be plotted at ax[0,1,0].hist(df.loc[cond, 'Gap']...), the third file will be plotted at ax[0,0,1].hist(df.loc[cond, 'Gap']...), the forth file will be at ax[1,0,0].hist(df.loc[cond, 'Gap']...), thank you in advance – tonino Sep 11 '21 at 15:00
I added the desired output that I am trying to get at the principal question @Code Different – tonino Sep 11 '21 at 15:47
There are 2 series in each chart and there are 12 charts. How can 24 series come from 12 files? – Code Different Sep 11 '21 at 15:53
I thought that with 'df_cat_sha = df_cat[(df_cat.Prof.between(0, 75))]' I filter only depths from 0 to 75, then with 'df_cat_sha.Gap.hist(bins=18, ax=axs[i, j], color='green', alpha=0.75)' I only take the GAP column form the Dataframe, and plot this column as histograms – tonino Sep 11 '21 at 18:58

Populate subplots with histograms within for loop

1 Answers1