8

I am currently using MS Access to produce reports but am somewhat limited with some of the calculations that I need to do.

I was looking into perhaps using Python to run the reports i.e. one report per row of data which takes the column fields and puts them in text boxes going down, which looks like:

Report template

How would this be possible with Python?

Nicholas
  • 3,517
  • 13
  • 47
  • 86
  • 1
    I don't understand what you are asking for, could you give some examples of what you are trying to do? – Tadhg McDonald-Jensen May 16 '17 at 12:36
  • I am sorry, so I want to read a CSV with lots of data in (many rows per reference number), do a load of manipulating to get it into a 'one row per reference number' format...... then I want to export it to Word or PDF in a report format i.e. to go along the row and put it in text boxes like the picture above, and have one report per reference number – Nicholas May 16 '17 at 12:39
  • 1
    Currently you are describing a whole process with lots of individual steps (opening the file, doing the calculations, exporting) each of which you will be able to find references on elsewhere. Is there something specific you are having trouble with? As written I think the answer to your question is "start trying and see where you get." :) – Tadhg McDonald-Jensen May 16 '17 at 12:56
  • Sorry, I can do all the first bits i.e. reading it and manipulating it (getting new columns etc), what I want to do is take the new CSV that I created in Pandas and put it into a report like structure like my picture rather than in a csv? :) – Nicholas May 16 '17 at 13:20
  • I could also recommend you to check reportlab for Python it is useful for high performance tables, images, graphs & paragraphs based PDF generation. here is the link https://www.reportlab.com/opensource/installation/ – Ronald Saunfe May 15 '18 at 16:11

3 Answers3

20

This goes a little beyond Pandas, but you can create a PDF report from each row of your Pandas DataFrame (tested with version 1.2.5) with the help of the following Python libraries:

  • jinja2: template engine, tested with version 3.0.1
  • xhtml2pdf: converts HTML into PDF, tested with version 0.2.5

First, define the structure and the looks of the report in report_template.html:

<html>
    <head>
        <style type="text/css">
            html, body {
                width: 500px;
                font-size: 12px;
                background: #fff;
                padding: 0px;
            }
            #my-custom-table {
                width: 500px;
                border: 0;
                margin-top: 20px;
            }
            #my-custom-table td {
                padding: 5px 0px 1px 5px;
                text-align: left;
            }
        </style>
    </head>
    <body>
        <table cellspacing="0" border="0" style="width:500px; border:0; font-size: 14px;">
            <tr>
                <td style="text-align:left;">
                    <b><span>Title of the PDF report - Row {{ row_ix + 1 }}</span></b>
                </td>
                <td style="text-align:right;">
                    <b><span>{{ date }}</span></b>
                </td>
            </tr>
        </table>
        <table cellspacing="0" border="0" id="my-custom-table">
            {% for variable_name, variable_value in row.iteritems() %}
            {% if loop.index0 == 0 %}
            <tr style="border-top: 1px solid black;
                       border-bottom: 1px solid black;
                       font-weight: bold;">
                <td>Variable name</td>
                <td>Variable value</td>
            </tr>
            {% else %}
            <tr>
                <td>{{ variable_name }}</td>
                <td>{{ variable_value }}</td>
            </tr>
            {% endif %}
            {% endfor %}
        </table>
    </body>
</html>

Then, run this Python 3 code, which converts each row of DataFrame into HTML string via jinja2 and then converts the HTML to PDF via xhtml2pdf:

from datetime import date

import jinja2
import pandas as pd
from xhtml2pdf import pisa

df = pd.DataFrame(
    data={
        "Average Introducer Score": [9, 9.1, 9.2],
        "Reviewer Scores": ["Academic: 6, 6, 6", "Something", "Content"],
        "Average Academic Score": [5.7, 5.8, 5.9],
        "Average User Score": [1.2, 1.3, 1.4],
        "Applied for (RC)": [9.2, 9.3, 9.4],
        "Applied for (FEC)": [5.5, 5.6, 5.7],
        "Duration (Months)": [36, 37, 38],
    }
)

for row_ix, row in df.iterrows():

    # Pandas DataFrame to HTML
    html = (
        jinja2.Environment(loader=jinja2.FileSystemLoader(searchpath=""))
        .get_template(name="report_template.html")
        .render(
            date=date.today().strftime("%d, %b %Y"),
            row_ix=row_ix,
            row=row,
        )
    )

    # Convert HTML to PDF
    with open("report_row_%s.pdf" % (row_ix + 1), "w+b") as out_pdf_file_handle:
        pisa.CreatePDF(
            # HTML to convert
            src=html,
            # File handle to receive the result
            dest=out_pdf_file_handle,
        )

For the DataFrame specified in the Python code, 3 PDFs will be outputted. The first PDF looks like this (converted to PNG to be able to show it here): One row of Pandas DataFrame converted to PDF via HTML by using Jinja2 and xhtml2pdf

tuomastik
  • 4,559
  • 5
  • 36
  • 48
  • 2
    Tuomastik, thank you SO much for taking the time to help me with this. I was working on this on on Monday and was trying to get my head around latex and was asking some people at work (to no avail) who might have had experience with it whilst doing their PhDs...... your response is so much simpler and I completely understand what it is doing. If I could buy you a beer, I would! :).... thank you! I will mark your answer as the correct one – Nicholas May 17 '17 at 16:57
  • Hey, thank you again. I got it all working. Perfect. Quick question before I start to get messy, but what would I change in the code to make all the reports go into the same PDF, but one under another? (sorry if this is a little cheeky asking) – Nicholas May 18 '17 at 09:36
  • 1
    @ScoutEU remove the for loop from the Python code and integrate it around the for loop in the HTML code. – tuomastik May 18 '17 at 09:56
  • 1
    Practical Business Python also has a [blog post](http://pbpython.com/pdf-reports.html) about this – Maarten Fabré May 18 '17 at 12:41
  • Thank you, I was struggling to get the html for loop in, ill check out that blog post – Nicholas May 18 '17 at 13:41
  • Hey Tuomastik, I have tried and failed to get it all to run into one PDF (with each report starting on a fresh page), I have added a 50 bounty reward if you wanted it :).... the link to the bounty is: http://stackoverflow.com/questions/44054938/python-jinja2-putting-all-html-from-python-into-one-pdf-rather-than-multipl – Nicholas May 20 '17 at 21:55
2

Reading in CSV files with Pandas: yes, definitely possible. See: http://pandas.pydata.org/pandas-docs/stable/io.html#io-read-csv-table

Producing reports with Pandas: depends on what exactly you're looking for. Pandas has many different output writing functions but their focus is on producing tables, not on producing entire documents. The closest to a 'document'-style output that you can get directly from Pandas is probably the HTML table output: http://pandas.pydata.org/pandas-docs/stable/io.html#io-html

Xukrao
  • 8,003
  • 5
  • 26
  • 52
  • Hey, yes, sorry, my question wasnt clear, I am comfortable in using Python Pandas, it was more the putting it into a report that I was stuck on :).... I think the 'latex' option described by P.Tillman is probably my best bet, but will certainly check out your link. Thank you very much – Nicholas May 16 '17 at 13:28
1

That surely is possible but i don't think that pandas provides such a functionality. You might want to take a look at latex where you 'programm' and compile documents (which by itself has nothing to do with python). You can create a latex template and dynamically fill it with content in python and then compile a pdf document but it probably will take some effort to find your way into latex.

P.Tillmann
  • 2,090
  • 10
  • 17