2

First and foremost, I'd like to precise that I am no expert in Python and still learning how to use pandas. I dig through older posts but I don't find a suitable answer.

I've been trying to code a data analysis of 92 contracts. For each of them, I'd like to plot a specific analysis (taking some columns of a same dataframe each time) and save each analysis in a different folder (Analysis 1, Analysis 2, ...).

So far, I am facing many difficulties. Thus, before focusing on WHAT to plot, I'd like to understand how to code the saving of each plot in a different .png file each time. The code I've tried does not seem to save anything as when I go to the folder it's empty.

Thanks to waykiki's help, I've been able to update my code. Now I know how to create as many folders as analysis I produce. Yet, I do not seem to understand how to code the plot of 92 graphs per analysis. My code now looks like this:

import pandas as pd
import matplotlib.pyplot as plt
import os

# Folder in which I want the analyses to be saved
URL5 = r"C:\Users\A\AppData\Local\Programs\Python\Python39"
# 1 graph per ID_Contrat (thus, 92 graphs)
groups = outer_merged_df.groupby("ID_Contrat") #where outer_merged_df is my dataframe
# How to name each plot.
List_ID_Contrat = outer_merged_df["ID_Contrat"].tolist()

def create_plot(file_name, x, y):
    # Create your plot. It is my understanding that here I should just give the x and the y I want to plot.
    fig = plt.figure()
    plt.plot(x, y, color = "red", kind = "line", legend = "true", linewidth = 2)
    plt.savefig(file_name)
    plt.show()

def main():
    # must be full-path. 
    parent_folder = URL5
    # move to parent directory
    os.chdir(parent_folder)
    # I want file_name to be different for each graph
    extension = ".png"
    # 5 = how many analyses I want to do
    for i in range(5):
        for name in List_ID_Contrat :
            file_name = "Analyse" + str[i+1] "{}" + extension.format(name) # I want file_name to be different for each graph and looking like "Analyse i Contrat XX"
        # Create a new folder
        folder_name = 'Analysis ' + str(i+1)
        os.mkdir(folder_name)
        full_file_name = folder_name + '/' + file_name
        x = np.linspace(1,100,100)
        y = np.random.random(100)
        create_plot(full_file_name, x, y)
        print("plot "+ savefile +" finished".format(name))
        
if __name__ == "__main__":
    main()

Yet, when I run my code, it does not plot 92 graphs nor want to create the folders anymore (though it did using Waykiki's method). The for loop is broken during hte first round (i only get the folder "Analysis 1") I get the Error Message:

AttributeError: 'Line2D' object has no property 'kind'

Could you please explain to me how I can save the graphs ? I am getting lost ..

Thanks

Aurore
  • 31
  • 5
  • Hello ! I do not see where you are plotting the graph. you should have something like plt.plot(x,y) if using matplotlib.pyplot as plt or pd.plot(x,y) if using pandas as pd. – coyote Aug 19 '21 at 08:41
  • Hello, you are right ! I forgot some part of my code ... I edited my post consequently ! :) – Aurore Aug 19 '21 at 08:54
  • You didn't provide reproducible example. I could give you a simple example on how to do it right now, or you can provide minimal reproducible example. – Karina Aug 19 '21 at 10:21
  • Ok, let me update my post so we have common grounds. Thanks a lot ! – Aurore Aug 19 '21 at 10:56

3 Answers3

2

I think your approach is right, in the sense that you've divided your problem into 2 steps:

1.) Get the technical details done (create, organise and navigate through the folders and data).

2.) Do the actual creation/drawing of plots.

Here is a simple prototype script. This script creates N number of subfolders located in the main directory '/home/user/my_analysis/'. All subfolders are named "AnalysisX", where X is the number of the folder.

Every folder contains a different plot.

Note: my folder paths are for a linux machine, so just keep in mind that '/home/user/some_folder/' isn't a valid path in windows! (I see you've already got that part right, but it might be useful for other users).

import os
import numpy as np
import matplotlib.pyplot as plt


def create_plot(file_name, x, y):
    # Create your plot
    fig = plt.figure()
    plt.plot(x, y, color='red', linewidth=2)
    plt.savefig(file_name)
    plt.show()


def main():
    # must be full-path
    parent_folder = '/home/user/my_analysis/'

    # move to parent directory
    os.chdir(parent_folder)

    file_name = 'plot'
    extension = '.png'
    for i in range(5):
        # Create a new folder
        folder_name = 'Analysis' + str(i+1)
        os.mkdir(folder_name)

        full_file_name = folder_name + '/' + file_name + extension
        x = np.linspace(1, 100, 100)
        y = np.random.random(100)
        create_plot(full_file_name, x, y)


if __name__ == '__main__':
    main()

For clarity, this is what the folder-structure looks like. I've only censored my real username:

enter image description here

waykiki
  • 914
  • 2
  • 9
  • 19
  • Thanks a lot ! It works just fine ! Now, as I want to plot 92 different graphs (1 per contract) I tried to edit the code but it seems there is an error .. Let me update my post – Aurore Aug 19 '21 at 11:18
  • Effectively, I tried to modify your code so the program understand that now, for each analysis I want : 1) as many graphs as I have contracts 2) each analysis is different I keep trying alternatives to my code but if you have an idea, could you please let me know ? – Aurore Aug 19 '21 at 11:47
  • ``/`` is not allowed in windows. Maybe that's why? – Karina Aug 19 '21 at 12:19
  • Hey Karina, my trouble is : my code does not understand that I want 92 graphs for which data should come from some columns of a dataframe and I want each of them no be named with the element of a list (list that I created out of the dataframe) – Aurore Aug 19 '21 at 12:24
  • Aurore no prob! I've taken a look at the updated code you provided us with, and the error you copied. It seems that @Karina has answered this already - currently the error is that you're mixing two plotting styles. plt.plot(...) is a matplotlib function. This function does not accept an argument named "kind". Remove it, and try running the code without it. – waykiki Aug 19 '21 at 16:10
2

You still haven't provide the DataFrame as an example. I have no access to your local folder. I assume you have pandas DataFrame anyway, so I write the code for random data. Before giving you a code, I'll try to clear up some misunderstanding:

1. Quoting your comment:

# Create your plot. It is my understanding that here I should just give the x and the y I want to plot. Yes, this is correct. However, you mixed up pandas plotting and matplotlib:

plt.plot(x, y, color = "red", kind = "line", legend = "true", linewidth = 2)

Stick to one. kind='line', legend = 'true' is pandas plotting, while plt.plot() is matplotlib plotting. Mixing it won't work ;)

2. extension = '.png' is not necessary (at least in this case)

plt.savefig() will always give you .png anyway. I didn't try it, but I guess it might even cause additional problem if you add .png as your file name.

So this is my code:

def create_plot(file_name, x, y):
    fig, ax = plt.subplots()
    ax.plot(x, y, 'r', linewidth = 2)
    plt.savefig(file_name)
    plt.close()

def createalotofdata(n, df):
    for i in range(n):
        df[f'data number{i}'] = np.random.rand(10)
#     print(df)

x = np.arange(10)
df = pd.DataFrame({'x0': x})

createalotofdata(5, df)

for i in range(len(list(df))-1):
    create_plot(f'Plot number {i}', df['x0'], df[f'data number{i}'])

So the output will be nothing to see, only the plots are saved:

enter image description here

Hope you understand and can adapt according to your need. Do ask again if something is still unclear.

Karina
  • 1,252
  • 2
  • 5
  • 16
  • Hello Karina, 1st many thanks for the explanations. You're right, I didn't provide a dataframe ... Slipped my mind. I keep mixing what is matplot library and what is pandas' : which plotting should I rather use : it seems that you code using myplotlib, is it because it's more intuitive ? easier ? is there a specific reason ? Extension did not create any extra error message but thanks for the advice ! :) – Aurore Aug 20 '21 at 08:04
  • I spent time yesterday on coding what I wanted and I finally managed to do so ! Your explanations and code greatly helped me. I'll reply to my own post so other users can find a way out :). Plus, you could give me your thoughts about it ? I'm still a beginner, I assume that there are things I could have done less complexily .. – Aurore Aug 20 '21 at 08:07
  • Your welcome! as for your question: ``matplotlib.pyplot`` is just my personal preference and I find it is easier to understand and modify. Moreover, it can do much more than pandas in term of plotting, there is a reason why it is meant to be a plotting library and the other one more for dataframe. – Karina Aug 20 '21 at 08:33
  • Thanks for your answer ! You're right, obviously this matplotlib library was not created for nothing ! I'll keep going : I now need to tell my program to plot different analyses ! :) – Aurore Aug 20 '21 at 09:08
1

So yesterday I posted this question : how can I plot n graphs, for different analyses, and save them in different .png files ? Thanks to Karina and Waykiki (and somehow, myself) I made it ! Below is the code I now have - that actually works - with an example.

I created a simple example with a simple dataframe :

import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame({'ID':['A','B','B','A','C','C'], 'X': [5,3,4,2,5,3], 'Y':[1,2,6,4,5,2]}) #simple dataframe

def create_plot(file_name, x, y):
    # Create your plot
    plt.plot(x, y, color='red', linewidth=2, label = ID) # As I was advised, I stoped using "group.plot" which is a function from pandas plotting library : stick to one library !
    plt.savefig(file_name)
    plt.show()

def main():
    # must be full-path
    parent_folder = r"C:\Users\A\AppData\Local\Programs\Python\Python39\Test"
    
    # move to parent directory
    os.chdir(parent_folder)

    extension = '.png'
    for i in range(5):
        # Create a new folder
        folder_name = 'Analysis' + str(i+1)
        file_name = 'Analysis' + str(i+1)
        #print(type(file_name))
        os.mkdir(folder_name)
        for ID in df.ID.unique():
        #for ID, group in groups:
            df1 = df[df.ID == ID]
            file_name = "Analysis " + str(i+1) +" - {}".format(ID)
            print(file_name)
            full_file_name = folder_name + '/' + file_name + extension
            x = df1.X
            y = df1.Y
            create_plot(full_file_name, x, y)
if __name__ == '__main__':
    main()

This code works. I can now :

  1. Plot figures using the create_plot() function
  2. Create 1 folder per analysis (here 5 analysis)
  3. Save each plot to a .png file whose name is as defined in "file_name" (namely Analysis 1 - C (in folder Analysis1), Analysis 2 - A (in folder Analysis2, ...)

Now what I need to code is :

  1. How to tell to my code that for analysis 1 I want some columns of my df, for analysis 2 some other columns, and so on and so forth
  2. Change x_axis label so it presents dates that I have defined.

Hope this will help the community !

Aurore
  • 31
  • 5
  • You need to close those plots though... Imagine having 92 plots all opened (unless you purposely want it to have it so). You will surely trigger the matplotlib warning for opening more than 20 figures at once. Btw, just a tip (maybe not necessarily crucial), if you have your ``file_name`` with just ``str(i+1)`` to differentiate every file, there is a good chance that your older generated plots will be replaced by the new one when you run your code again to generate new plots. – Karina Aug 20 '21 at 08:40
  • I coded at the beginning the following so it does not show 92*4 = 368 graphs %matplotlib agg %matplotlib agg – Aurore Aug 20 '21 at 09:04
  • my file name looks as follows : Analysis 3 - Name_of_the_contract with my code if I want to run it again, I first need to delete all the folders. I believe it's okay for now : I can just save them elsewhere and launch my programm all over again – Aurore Aug 20 '21 at 09:07
  • I just somehow have the feeling that the plots will be on top of each other instead of one for each plot. I mean you add your second plot to the first plot, not a new plot. Did you check your .png files? is it one for each? If yes, then sorry, my fault. I have to admit I don't know what ``%matplotlib agg`` does. – Karina Aug 20 '21 at 09:41
  • Each plot is different :) I've read that it prevents the figures to show up – Aurore Aug 20 '21 at 09:47