0

I'm working on an AlteryxPythonSDK Tool to process PDFs and extract tables from them. The plugin consists of an XML configuration, an HTML user interface, and a Python script. Despite implementing the necessary methods, I'm facing an issue where the ii_init and ii_push_record methods are not being called during the plugin execution.

XML Configuration (PDFExtractToolConfig.xml):

<?xml version="1.0"?>
<AlteryxJavaScriptPlugin>
  <EngineSettings EngineDll="Python" EngineDllEntryPoint="PDFExtractToolEngine.py" SDKVersion="10.1" />
  <GuiSettings Html="PDFExtractToolGUI.html" Icon="PDFExtractTool.png" Help="https://your-help-link-here.com" SDKVersion="10.1">
    <InputConnections>
      <Connection Name="Input" AllowMultiple="False" Optional="False" Type="Connection" Label="PDF Input"/>
    </InputConnections>
    <OutputConnections>
      <Connection Name="Output" AllowMultiple="False" Optional="False" Type="Connection" Label="Table Output"/>
      <Connection Name="ErrorOutput" AllowMultiple="False" Optional="False" Type="Connection" Label="Error Output"/>
    </OutputConnections>
  </GuiSettings>
  <Properties>
    <MetaInfo>
      <Name>PDF to Table</Name>
      <Description>Reads PDFs and extracts tables</Description>
      <CategoryName>Data Parsing</CategoryName>
      <SearchTags>pdf, table, parsing</SearchTags>
      <ToolVersion></ToolVersion>
      <Author></Author>
      <Company></Company>
      <Copyright></Copyright>
    </MetaInfo>
  </Properties>
</AlteryxJavaScriptPlugin>

HTML User Interface (PDFExtractToolGUI.html):

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>PDF to Table</title>
  <script type="text/javascript">
    document.write('<link rel="import" href="' + window.Alteryx.LibDir + '2/lib/includes.html">');
  </script>
</head>
<body>
  <h1>PDF to Table</h1>
  <form>
    <fieldset>
      <legend>XMSG("Select Options")</legend>
      <section>
        <label>XMSG("Select a PDF file:")</label>
        <ayx aria-label="data-source-metainfo-filebrowse" data-ui-props='{type:"FileBrowse", widgetId:"dataSourceFilePath", fileTypeFilters: "PDF Data Source|*.pdf|All Files|*.*", placeholder:"XMSG("Select .pdf file...")"}' data-item-props='{dataName: "PdfField", dataType:"SimpleString"}'></ayx>
      </section>
    </fieldset>
  </form>

  <style>
    body {
      font-size: 10pt;
      font-family: Arial, sans-serif;
      margin: 20px;
    }

    legend {
      border: none;
    }

    fieldset {
      border: 2px solid #EA7C7C;
      border-radius: 5px;
    }

    section, label, select, checkbox, input {
      padding: 10px 0;
    }
  </style>
</body>
</html>

Python Script (PDFExtractToolEngine.py):

import AlteryxPythonSDK as Sdk
import tabula

class AyxPlugin:
    def __init__(self, n_tool_id: int, alteryx_engine: object, output_anchor_mgr: object):
        self.n_tool_id = n_tool_id
        self.alteryx_engine = alteryx_engine
        self.output_anchor_mgr = output_anchor_mgr
        self.input = None

    def pi_init(self, str_xml: str):
        self.input = None

    def pi_add_incoming_connection(self, str_type: str, str_name: str) -> object:
        self.input = self
        return self

    def pi_add_outgoing_connection(self, str_name: str) -> bool:
        return True

    def pi_push_all_records(self, n_record_limit: int) -> bool:
        self.alteryx_engine.output_message(self.n_tool_id, Sdk.EngineMessageType.info, "Processing PDFs...")
        return False

    def pi_close(self, b_has_errors: bool):
        self.alteryx_engine.output_message(self.n_tool_id, Sdk.EngineMessageType.info, "pi_close...")
        pass

    def ii_init(self, record_info_in: Sdk.RecordInfo) -> bool:
        self.pdf_field = record_info_in.get_field_by_name('PdfField')
        return True

    def ii_push_record(self, in_record: Sdk.RecordRef) -> bool:
        pdf_path = self.pdf_field.get_as_string(in_record)

        if pdf_path:
            self.alteryx_engine.output_message(self.n_tool_id, Sdk.EngineMessageType.info, f"Processing PDF: {pdf_path}")
            try:
                tables = tabula.read_pdf(pdf_path, pages='all')
                for table_num, table in enumerate(tables):
                    self.alteryx_engine.output_message(self.n_tool_id, Sdk.EngineMessageType.info, f"Extracted Table {table_num + 1}:")
                    self.alteryx_engine.output_message(self.n_tool_id, Sdk.EngineMessageType.info, str(table))
                    # Process and push the table data downstream here
            except Exception as e:
                self.alteryx_engine.output_message(self.n_tool_id, Sdk.EngineMessageType.error, f"Error processing PDF: {str(e)}")
                return False
        else:
            self.alteryx_engine.output_message(self.n_tool_id, Sdk.EngineMessageType.warning, "No PDF path found in input record.")

        return True

    def ii_update_progress(self, d_percent: float):
        pass

    def ii_close(self):
        pass

I've tried various debugging approaches, such as using the alteryx_engine.output_message method to print debug messages. However, no output messages from the ii_init and ii_push_record methods are displayed, indicating that these methods are not being called.

I would appreciate any insights or suggestions on why the ii_init and ii_push_record methods might not be called and how I can troubleshoot this issue.

Thank you in advance for your assistance!

Xplosio
  • 13
  • 2
  • 7

0 Answers0