0

I have 4 sets (i.e bbs) of 3 .csv files (i.e. replicas) with 2 columns each: time (X-Axis) and interaction Frequency (Y-Axis 1). I also need to plot error bars to a second y axis which i have achieved. Since they have similar paths, I am reading them through filename = with %s for each set and replica.

Right now I can plot them all one after the other but what I would like to achieve would be:

  1. Create a subplot to plot all the replicas of each set into one fig to save them together in one jpg. and not have 12 jpg. figures ​

  2. To have a cycler of three colours for each replica so that they are distinguishable in the final figure, plus a common legend for each fig

  3. and ultimately have a common y Axis range so that the results are visibly comparable since im comparing frequencies in a histogram.

here is my code so far:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

bbss = [60,80,100,120]
replicas = ["1", "2", "3"]

colors = ['tab:blue', 'tab:orange', 'tab:green']

for bbs in bbss:
    for i, replica in enumerate(replicas):
        filename = "elliottIV_HA_IHSS/box4x4x4/bbs%s/bbpm2/NA/pH7.0/r%s/analysis/mdf/MIN_1:13.dat" % (bbs, replica)
        data = pd.read_csv(filename, engine='python', delimiter="\s+|:", names=["time", "distance", "mol", "atom"], usecols=[0,1,3,4])
        fig, ax1 = plt.subplots()
        data["mol"] -= 1
        data["mol"].plot.hist(figsize=(15, 5), rwidth=0.8, bins=range(1,int(bbs/2+2)), align="left", ax=ax1)
        plt.suptitle('Replica-{}, {} building blocks:'.format(replica, bbs), fontsize=14)
        plt.xticks(range(1,int(bbs/2+1)))
        plt.xlabel("Molecule Nr.")
        #plt.savefig('{}bbs.jpg'.format(bbs), bbox_inches='tight')
        #plt.clf()
        
        ax2 = ax1.twinx()
        m = data.groupby("mol").mean()["distance"].values
        s = data.groupby("mol").std()["distance"].values
        i = data.groupby("mol").mean().index.values
        ax2.errorbar(i, m, yerr=s, marker="o", fmt="none", color="tab:red")
        ax2.set_ylabel("Distance [nm]")
    
        fig.tight_layout()
        plt.show()

The dataframe looks something like this:

             time  distance  mol  atom
0            0.0  0.368683   16     3
1            1.0  0.364314   16     1
2            2.0  0.358840   16     3
3            3.0  0.321033   16     3
4            4.0  0.361127   16     3
...          ...       ...  ...   ...
249995  249995.0  0.536088   13     3
249996  249996.0  0.508320   13     3
249997  249997.0  0.475273   13     3
249998  249998.0  0.559773   13     3
249999  249999.0  0.515042    7    11

The Frequency on y-axis 1 shows the frequency of each "mol" appearing (there are 30) X is obviously the time which there are 250k time steps and the second y-axis shows the average of the distance column for each of the "mol" as well as the error and st.dev

Hope that clarifies it a bit - thanks!

  • 1
    Can you provide example dataframes for an easy reproduction of your code? Did you found the following tutorial about subfigures? There, three subfigures with two curves are plotted: https://matplotlib.org/stable/gallery/lines_bars_and_markers/errorbar_subsample.html#sphx-glr-gallery-lines-bars-and-markers-errorbar-subsample-py – BenHeid May 30 '21 at 11:08
  • @BenHeid - thank you for your reply, I tried to provide some more clarification – cryptococcus May 30 '21 at 15:45
  • This means that you would like to have four plots, where each plot contains one histogramm with the three replicas. And each replica should be have a different color? – BenHeid May 30 '21 at 18:00
  • Almost - I want 4 figures with three plots each representing one replica The three colours would repeat for each fig - so fig 1 one: rep 1 blue, rep2 orange, rep3 green; fig 2: and so on - thank you! – cryptococcus May 31 '21 at 09:18

1 Answers1

0

For simplicity, I create the following data set:

data = pd.DataFrame([[0.0,  0.368683  , 16,3], [1.0,  0.364314  , 15,    1], [2.0 , 0.358840  , 16  ,   3], [3.0 , 0.321033  , 17   ,  3], [4.0 , 0.361127   ,17    , 3]], 
                    columns=["time",  "distance", "mol",  "atom"])

Afterwarfs I created via the following code the desired three plots via the following lines of code:

bbss = [60,80,100,120]^M
replicas = ["1", "2", "3"]^M
colors = ['blue', 'orange', 'green']^M
for bbs in bbss:^M
   fig, axes = plt.subplots(3,1)
   plt.xticks(range(1,int(bbs/2+1)))
   plt.xlabel("Molecule Nr.")
   for i, replica in enumerate(replicas):
       filename = "elliottIV_HA_IHSS/box4x4x4/bbs%s/bbpm2/NA/pH7.0/r%s/analysis/mdf/MIN_1:13.dat" % (bbs, replica)^M
       # data = pd.read_csv(filename, engine='python', delimiter="\s+|:", names=["time", "distance", "mol", "atom"], usecols=[0,1,3,4])
       data["mol"] -= 1
       ax = axes[i]
       data["mol"].plot.hist(figsize=(15, 5), rwidth=0.8, bins=range(1,int(bbs/2+2)), color=colors[i], align="left", ax=ax)
       ax2 = ax.twinx()
       m = data.groupby("mol").mean()["distance"].values
       s = data.groupby("mol").std()["distance"].values
       i = data.groupby("mol").mean().index.values
       ax2.errorbar(i, m, yerr=s, marker="o", fmt="none", color="tab:red")
       ax2.set_ylabel("Distance [nm]")
   plt.suptitle('Replica-{}, {} building blocks:'.format(replica, bbs), fontsize=14)
   fig.tight_layout()
   plt.show()

Note for your solution you should uncomment the line of reading data. The resulting plot looks like: enter image description here

I hope this is the solution you are looking for.

BenHeid
  • 214
  • 2
  • 14