0

I have an Excel in xlsx format. It has two sheets, TransactionDetails and Pivot. Pivot sheet contains pivot tables generated from data in TransactionDetails sheet. How can I convert the pivot sheet into HTML?

I have tried using Apache POI and POI is unable to read cells from pivot tables (the pivot table was generated using Apache POI).

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
  • 2
    "The pivot table was generated using Apache Poi": Apache POI is not able creating rendered pivot tables. It only creates pivot table definitions which Excel then renders into the sheet while opening the file. So before rendering in Excel there is nothing in the sheet that could be converted. – Axel Richter May 11 '23 at 03:37
  • Can I use the definitions to create my html? As this will be part of an automated process, we cannot open the excel file. – abhijit.bhatta May 11 '23 at 11:49
  • 2
    I doubt that. But `*.xlsx` files are simply ZIP archives. So create your `*.xlsx` having a pivot table using Apache POI. Then open that `*.xlsx` file as ZIP archive. 7-Zip can do this by simply right-clicking on the `*.xlsx` file and choosing `7-Zip -> Open` from the context menu. Then have a look at `/xl/worksheets/sheet*.xml` to see that there is nothing of the pivot data in the sheet. Then have a look at `/xl/pivotTables/pivotTable*.xml` to see what you have there. – Axel Richter May 11 '23 at 12:24
  • I figured out how to do this. I created a batfile with vbsctipts to open and save the file ( in my case, i saved it as a .xls) and then, used pandas to convert the pivot to html. I will put the complete solution here once I am on my system. Since I was automating a process, no manual intervention could be involved. – abhijit.bhatta May 13 '23 at 16:09

1 Answers1

0

I was able to find a solution to the above problem. Since the pivot tables were generated by Apache POI, the excel needed to be opened first and saved for the excel to load the pivot tables from the definitions. I my case, I save also saved the excel as an xls file (Pandas had issues with reading pivot table filters generated by POI).

    @echo off
set "wpsPath=C:\Users\CONCERTO_L221\AppData\Local\Kingsoft\WPS Office\ksolaunch.exe"  
set "filePath=D:\Automation_Workspace\ACS_Data_Sync\src\test\testdata\New_ACS MONITORING_5 PM - 6 PM_Pivot.xlsx" 
set "macroName=SaveAsXLS"  
"%wpsPath%" "%filePath%"
timeout /t 5  
echo.Set objExcel = CreateObject("Excel.Application") > SaveAsXLS.vbs
echo.Set objWorkbook = objExcel.Workbooks.Open("%filePath%") >> SaveAsXLS.vbs
echo.objWorkbook.SaveAs "%~dp0file.xls", -4143 >> SaveAsXLS.vbs
echo.objWorkbook.Close False >> SaveAsXLS.vbs
echo.objExcel.Quit >> SaveAsXLS.vbs
wscript.exe SaveAsXLS.vbs
timeout /t 2  
del SaveAsXLS.vbs
taskkill /f /im wps.exe  
exit

Thereafter , I used xlrd to convert my Pivot table to html.

import os
import sys
import xlrd
import argparse

def excel_to_html(input_file, sheet_name, output_file):
    # Load the workbook and select the sheet
    wb = xlrd.open_workbook(input_file)
    sheet = wb.sheet_by_name(sheet_name)

    # Define cell border style
    cell_style = "border: 1px solid black;"

    def create_table(col_start, col_end):
        table_rows = []
        for row_num in range(sheet.nrows):
            html_row = ["<tr>"]
            for col_num in range(col_start, col_end + 1):
                cell_value = sheet.cell_value(row_num, col_num)
                if isinstance(cell_value, (int, float)) and cell_value.is_integer():
                    cell_value = int(cell_value)
                cell_value = "" if cell_value is None else str(cell_value)

                # Apply border conditionally
                style = ""
                if not (col_num in (5, 6) or row_num in (0, 1)):
                    if not (col_num in (7, 8) and row_num not in (2, 3, 4)):
                        style = cell_style

                # Modify borders for rows 3-5 and columns 1-5
                if row_num in (2, 3, 4) and col_num < 4:
                    style = "border-top: 1px solid black; border-bottom: 1px solid black;"

                # Apply background color conditionally
                if row_num in (2, 3, 4) and col_num < 5 or row_num == sheet.nrows - 1 and col_num < 5:
                    style += " background-color: lightSkyBlue;"

                html_row.append(f'<td style="{style}">{cell_value}</td>')
            html_row.append("</tr>")
            table_rows.append("".join(html_row))
        return table_rows

    # Create the two tables
    table1 = create_table(7, 8)
    table2 = create_table(0, 4)

    # Combine the tables with two blank rows
    combined_rows = table1 + ["<tr><td></td></tr>"] * 2 + table2

    # Wrap the combined rows in a table tag
    html = f'<table style="border-collapse: collapse; border-spacing: 0;">{"".join(combined_rows)}</table>'

    # Write to output file
    with open(output_file, "w") as f:
        f.write(html)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Convert Excel sheet to HTML")
    parser.add_argument("input_file", help="Path to the input Excel file")
    args = parser.parse_args()

    input_file = args.input_file
    sheet_name = "pivot"
    output_folder = os.path.dirname(input_file)
    output_file = os.path.join(output_folder, "pivot.html")

    excel_to_html(input_file, sheet_name, output_file)