1

I have been trying to find a way to add dataframe in a pdf file using pdfpages. But I did not find any opt solution yet. My dataframe has around 5k rows and 10 columns. How do i append it to pdf?

The code I have written gives a very blurry and small df. Like very very small and even if you zoom in, you can only see blurry stuff. Any optimal way to add df to pdf using pdfpages?

my code:

  df1 = data[data['olddate'].dt.date == data['newdate'].dt.date]
    table = pd.DataFrame(df1)
    fig = plt.figure(figsize=(12, 12))
    ax = fig.add_subplot(111)
    cell_text = []
    for row in range(len(table)):
        cell_text.append(table.iloc[row])
    ax.table(cellText=cell_text, colLabels=table.columns, loc='center')
    ax.axis('off')
    export_pdf.savefig(fig)
    plt.close()
slous
  • 11
  • 1
  • 2
  • You could try to convert the `df` to a `numpy array`, and keep in mind that if some cells (including the header) is very long, the font of all cells will be resized. In this case you should change the font size of the single cell with the long string – Joe Sep 05 '19 at 07:39
  • Okay. But can you tell me what is the issue with my code? – slous Sep 05 '19 at 10:36
  • 1
    Say the minimal readable fontsize you can use is 4pt. You need some spacing, so the cell height would be 6pt. If you have 5000 rows, it would hence need to span 5000*6 = 30000 pt. With the ppi of 72, this requires a page height of 30000/72=416 inches. So your figure size needs to be `figsize=(416, 12)`. (This is just a rough calculation, not taking any margins etc. into account.) – ImportanceOfBeingErnest Sep 05 '19 at 11:59

1 Answers1

2

Here there is an example, hope it can help:

import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
import numpy as np

with PdfPages('page_pdf.pdf') as pdf:
    table = pd.DataFrame({'a':[1,2,3,4], 'b':[3,4,5,6]})
    header = table.columns
    table = np.asarray(table)
    fig = plt.figure(figsize=(12, 12))
    ax = plt.Axes(fig, [0., 0., 1., 1.])
    ax.set_axis_off()
    fig.add_axes(ax)
    tab = plt.table(cellText=table, colWidths=[0.15, 0.25], colLabels=header, cellLoc='center', loc='center')
    tab.auto_set_font_size(False)
    tab.set_fontsize(30)
    tab.scale(0.7, 2.5)
    pdf.savefig(fig)
    plt.close()

You can change the size with tab.set_fontsize

Joe
  • 12,057
  • 5
  • 39
  • 55