0

Let's say I have the table:

            M1      M2      M3  
            S1  S2  S1  S2  S1  S2
P1  D1  T1  0   1   2   3   4   5
P2  D2  T2  6   7   8   9   10  11
P3  D3  T3  12  13  14  15  16  17

IN

The XML Input:

<?xml version="1.0" encoding="UTF-8"?>
<transformation>
  <info>
    <name>test</name>
    <description/>
    <extended_description/>
    <trans_version/>
    <trans_type>Normal</trans_type>
    <trans_status>0</trans_status>
    <directory>/</directory>
    <parameters>
    </parameters>
    <log>
      <trans-log-table>
        <connection/>
        <schema/>
        <table/>
        <size_limit_lines/>
        <interval/>
        <timeout_days/>
        <field>
          <id>ID_BATCH</id>
          <enabled>Y</enabled>
          <name>ID_BATCH</name>
        </field>
        <field>
          <id>CHANNEL_ID</id>
          <enabled>Y</enabled>
          <name>CHANNEL_ID</name>
        </field>
        <field>
          <id>TRANSNAME</id>
          <enabled>Y</enabled>
          <name>TRANSNAME</name>
        </field>
        <field>
          <id>STATUS</id>
          <enabled>Y</enabled>
          <name>STATUS</name>
        </field>
        <field>
          <id>LINES_READ</id>
          <enabled>Y</enabled>
          <name>LINES_READ</name>
          <subject/>
        </field>
        <field>
          <id>LINES_WRITTEN</id>
          <enabled>Y</enabled>
          <name>LINES_WRITTEN</name>
          <subject/>
        </field>
        <field>
          <id>LINES_UPDATED</id>
          <enabled>Y</enabled>
          <name>LINES_UPDATED</name>
          <subject/>
        </field>
        <field>
          <id>LINES_INPUT</id>
          <enabled>Y</enabled>
          <name>LINES_INPUT</name>
          <subject/>
        </field>
        <field>
          <id>LINES_OUTPUT</id>
          <enabled>Y</enabled>
          <name>LINES_OUTPUT</name>
          <subject/>
        </field>
        <field>
          <id>LINES_REJECTED</id>
          <enabled>Y</enabled>
          <name>LINES_REJECTED</name>
          <subject/>
        </field>
        <field>
          <id>ERRORS</id>
          <enabled>Y</enabled>
          <name>ERRORS</name>
        </field>
        <field>
          <id>STARTDATE</id>
          <enabled>Y</enabled>
          <name>STARTDATE</name>
        </field>
        <field>
          <id>ENDDATE</id>
          <enabled>Y</enabled>
          <name>ENDDATE</name>
        </field>
        <field>
          <id>LOGDATE</id>
          <enabled>Y</enabled>
          <name>LOGDATE</name>
        </field>
        <field>
          <id>DEPDATE</id>
          <enabled>Y</enabled>
          <name>DEPDATE</name>
        </field>
        <field>
          <id>REPLAYDATE</id>
          <enabled>Y</enabled>
          <name>REPLAYDATE</name>
        </field>
        <field>
          <id>LOG_FIELD</id>
          <enabled>Y</enabled>
          <name>LOG_FIELD</name>
        </field>
        <field>
          <id>EXECUTING_SERVER</id>
          <enabled>N</enabled>
          <name>EXECUTING_SERVER</name>
        </field>
        <field>
          <id>EXECUTING_USER</id>
          <enabled>N</enabled>
          <name>EXECUTING_USER</name>
        </field>
        <field>
          <id>CLIENT</id>
          <enabled>N</enabled>
          <name>CLIENT</name>
        </field>
      </trans-log-table>
      <perf-log-table>
        <connection/>
        <schema/>
        <table/>
        <interval/>
        <timeout_days/>
        <field>
          <id>ID_BATCH</id>
          <enabled>Y</enabled>
          <name>ID_BATCH</name>
        </field>
        <field>
          <id>SEQ_NR</id>
          <enabled>Y</enabled>
          <name>SEQ_NR</name>
        </field>
        <field>
          <id>LOGDATE</id>
          <enabled>Y</enabled>
          <name>LOGDATE</name>
        </field>
        <field>
          <id>TRANSNAME</id>
          <enabled>Y</enabled>
          <name>TRANSNAME</name>
        </field>
        <field>
          <id>STEPNAME</id>
          <enabled>Y</enabled>
          <name>STEPNAME</name>
        </field>
        <field>
          <id>STEP_COPY</id>
          <enabled>Y</enabled>
          <name>STEP_COPY</name>
        </field>
        <field>
          <id>LINES_READ</id>
          <enabled>Y</enabled>
          <name>LINES_READ</name>
        </field>
        <field>
          <id>LINES_WRITTEN</id>
          <enabled>Y</enabled>
          <name>LINES_WRITTEN</name>
        </field>
        <field>
          <id>LINES_UPDATED</id>
          <enabled>Y</enabled>
          <name>LINES_UPDATED</name>
        </field>
        <field>
          <id>LINES_INPUT</id>
          <enabled>Y</enabled>
          <name>LINES_INPUT</name>
        </field>
        <field>
          <id>LINES_OUTPUT</id>
          <enabled>Y</enabled>
          <name>LINES_OUTPUT</name>
        </field>
        <field>
          <id>LINES_REJECTED</id>
          <enabled>Y</enabled>
          <name>LINES_REJECTED</name>
        </field>
        <field>
          <id>ERRORS</id>
          <enabled>Y</enabled>
          <name>ERRORS</name>
        </field>
        <field>
          <id>INPUT_BUFFER_ROWS</id>
          <enabled>Y</enabled>
          <name>INPUT_BUFFER_ROWS</name>
        </field>
        <field>
          <id>OUTPUT_BUFFER_ROWS</id>
          <enabled>Y</enabled>
          <name>OUTPUT_BUFFER_ROWS</name>
        </field>
      </perf-log-table>
      <channel-log-table>
        <connection/>
        <schema/>
        <table/>
        <timeout_days/>
        <field>
          <id>ID_BATCH</id>
          <enabled>Y</enabled>
          <name>ID_BATCH</name>
        </field>
        <field>
          <id>CHANNEL_ID</id>
          <enabled>Y</enabled>
          <name>CHANNEL_ID</name>
        </field>
        <field>
          <id>LOG_DATE</id>
          <enabled>Y</enabled>
          <name>LOG_DATE</name>
        </field>
        <field>
          <id>LOGGING_OBJECT_TYPE</id>
          <enabled>Y</enabled>
          <name>LOGGING_OBJECT_TYPE</name>
        </field>
        <field>
          <id>OBJECT_NAME</id>
          <enabled>Y</enabled>
          <name>OBJECT_NAME</name>
        </field>
        <field>
          <id>OBJECT_COPY</id>
          <enabled>Y</enabled>
          <name>OBJECT_COPY</name>
        </field>
        <field>
          <id>REPOSITORY_DIRECTORY</id>
          <enabled>Y</enabled>
          <name>REPOSITORY_DIRECTORY</name>
        </field>
        <field>
          <id>FILENAME</id>
          <enabled>Y</enabled>
          <name>FILENAME</name>
        </field>
        <field>
          <id>OBJECT_ID</id>
          <enabled>Y</enabled>
          <name>OBJECT_ID</name>
        </field>
        <field>
          <id>OBJECT_REVISION</id>
          <enabled>Y</enabled>
          <name>OBJECT_REVISION</name>
        </field>
        <field>
          <id>PARENT_CHANNEL_ID</id>
          <enabled>Y</enabled>
          <name>PARENT_CHANNEL_ID</name>
        </field>
        <field>
          <id>ROOT_CHANNEL_ID</id>
          <enabled>Y</enabled>
          <name>ROOT_CHANNEL_ID</name>
        </field>
      </channel-log-table>
      <step-log-table>
        <connection/>
        <schema/>
        <table/>
        <timeout_days/>
        <field>
          <id>ID_BATCH</id>
          <enabled>Y</enabled>
          <name>ID_BATCH</name>
        </field>
        <field>
          <id>CHANNEL_ID</id>
          <enabled>Y</enabled>
          <name>CHANNEL_ID</name>
        </field>
        <field>
          <id>LOG_DATE</id>
          <enabled>Y</enabled>
          <name>LOG_DATE</name>
        </field>
        <field>
          <id>TRANSNAME</id>
          <enabled>Y</enabled>
          <name>TRANSNAME</name>
        </field>
        <field>
          <id>STEPNAME</id>
          <enabled>Y</enabled>
          <name>STEPNAME</name>
        </field>
        <field>
          <id>STEP_COPY</id>
          <enabled>Y</enabled>
          <name>STEP_COPY</name>
        </field>
        <field>
          <id>LINES_READ</id>
          <enabled>Y</enabled>
          <name>LINES_READ</name>
        </field>
        <field>
          <id>LINES_WRITTEN</id>
          <enabled>Y</enabled>
          <name>LINES_WRITTEN</name>
        </field>
        <field>
          <id>LINES_UPDATED</id>
          <enabled>Y</enabled>
          <name>LINES_UPDATED</name>
        </field>
        <field>
          <id>LINES_INPUT</id>
          <enabled>Y</enabled>
          <name>LINES_INPUT</name>
        </field>
        <field>
          <id>LINES_OUTPUT</id>
          <enabled>Y</enabled>
          <name>LINES_OUTPUT</name>
        </field>
        <field>
          <id>LINES_REJECTED</id>
          <enabled>Y</enabled>
          <name>LINES_REJECTED</name>
        </field>
        <field>
          <id>ERRORS</id>
          <enabled>Y</enabled>
          <name>ERRORS</name>
        </field>
        <field>
          <id>LOG_FIELD</id>
          <enabled>N</enabled>
          <name>LOG_FIELD</name>
        </field>
      </step-log-table>
      <metrics-log-table>
        <connection/>
        <schema/>
        <table/>
        <timeout_days/>
        <field>
          <id>ID_BATCH</id>
          <enabled>Y</enabled>
          <name>ID_BATCH</name>
        </field>
        <field>
          <id>CHANNEL_ID</id>
          <enabled>Y</enabled>
          <name>CHANNEL_ID</name>
        </field>
        <field>
          <id>LOG_DATE</id>
          <enabled>Y</enabled>
          <name>LOG_DATE</name>
        </field>
        <field>
          <id>METRICS_DATE</id>
          <enabled>Y</enabled>
          <name>METRICS_DATE</name>
        </field>
        <field>
          <id>METRICS_CODE</id>
          <enabled>Y</enabled>
          <name>METRICS_CODE</name>
        </field>
        <field>
          <id>METRICS_DESCRIPTION</id>
          <enabled>Y</enabled>
          <name>METRICS_DESCRIPTION</name>
        </field>
        <field>
          <id>METRICS_SUBJECT</id>
          <enabled>Y</enabled>
          <name>METRICS_SUBJECT</name>
        </field>
        <field>
          <id>METRICS_TYPE</id>
          <enabled>Y</enabled>
          <name>METRICS_TYPE</name>
        </field>
        <field>
          <id>METRICS_VALUE</id>
          <enabled>Y</enabled>
          <name>METRICS_VALUE</name>
        </field>
      </metrics-log-table>
    </log>
    <maxdate>
      <connection/>
      <table/>
      <field/>
      <offset>0.0</offset>
      <maxdiff>0.0</maxdiff>
    </maxdate>
    <size_rowset>10000</size_rowset>
    <sleep_time_empty>50</sleep_time_empty>
    <sleep_time_full>50</sleep_time_full>
    <unique_connections>N</unique_connections>
    <feedback_shown>Y</feedback_shown>
    <feedback_size>50000</feedback_size>
    <using_thread_priorities>Y</using_thread_priorities>
    <shared_objects_file/>
    <capture_step_performance>N</capture_step_performance>
    <step_performance_capturing_delay>1000</step_performance_capturing_delay>
    <step_performance_capturing_size_limit>100</step_performance_capturing_size_limit>
    <dependencies>
    </dependencies>
    <partitionschemas>
    </partitionschemas>
    <slaveservers>
    </slaveservers>
    <clusterschemas>
    </clusterschemas>
    <created_user>-</created_user>
    <created_date>2018/03/12 14:50:53.529</created_date>
    <modified_user>-</modified_user>
    <modified_date>2018/03/12 14:50:53.529</modified_date>
    <key_for_session_key/>
    <is_key_private>N</is_key_private>
  </info>
  <notepads>
  </notepads>
  <order>
  </order>
  <step>
    <name>Data Grid</name>
    <type>DataGrid</type>
    <description/>
    <distribute>Y</distribute>
    <custom_distribution/>
    <copies>1</copies>
    <partitioning>
      <method>none</method>
      <schema_name/>
    </partitioning>
    <fields>
      <field>
        <name>Product</name>
        <type>String</type>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>Description</name>
        <type>String</type>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>Type</name>
        <type>String</type>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>M1_S1</name>
        <type/>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>M1_S2</name>
        <type/>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>M2_S1</name>
        <type/>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>M2_S2</name>
        <type/>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>M3_S1</name>
        <type/>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
      <field>
        <name>M3_S2</name>
        <type/>
        <format/>
        <currency/>
        <decimal/>
        <group/>
        <length>-1</length>
        <precision>-1</precision>
        <set_empty_string>N</set_empty_string>
      </field>
    </fields>
    <data>
      <line>
        <item/>
        <item/>
        <item/>
        <item>M1</item>
        <item/>
        <item>M2</item>
        <item/>
        <item>M3</item>
        <item/>
      </line>
      <line>
        <item/>
        <item/>
        <item/>
        <item>S1</item>
        <item>S2</item>
        <item>S1</item>
        <item>S2</item>
        <item>S1</item>
        <item>S2</item>
      </line>
      <line>
        <item>P1</item>
        <item>D1</item>
        <item>T1</item>
        <item>0</item>
        <item>1</item>
        <item>2</item>
        <item>3</item>
        <item>4</item>
        <item>5</item>
      </line>
      <line>
        <item>P2</item>
        <item>D2</item>
        <item>T2</item>
        <item>6</item>
        <item>7</item>
        <item>8</item>
        <item>9</item>
        <item>10</item>
        <item>11</item>
      </line>
      <line>
        <item>P3</item>
        <item>D3</item>
        <item>T3</item>
        <item>12</item>
        <item>13</item>
        <item>14</item>
        <item>15</item>
        <item>16</item>
        <item>17</item>
      </line>
    </data>
    <cluster_schema/>
    <remotesteps>
      <input>
      </input>
      <output>
      </output>
    </remotesteps>
    <GUI>
      <xloc>32</xloc>
      <yloc>32</yloc>
      <draw>Y</draw>
    </GUI>
  </step>
  <step_error_handling>
  </step_error_handling>
  <slave-step-copy-partition-distribution>
  </slave-step-copy-partition-distribution>
  <slave_transformation>N</slave_transformation>
</transformation>

How can I make it so that the table becomes partially normalized, like:

                M1  M2  M3
P1  D1  T1  S1  0   2   4
P1  D1  T1  S2  1   3   5
P2  D2  T2  S1  6   8   10
P2  D2  T2  S2  7   9   11
P3  D3  T3  S1  12  14  16
P3  D3  T3  S2  13  15  17

OUT

Do I have to first merge the two headers, and then normalize + split values?


Further explanation:

becomes partially normalized, like means that the table isn't completely normalized, like:

                    Data
P1  D1  T1  M1  S1  0
P1  D1  T1  M1  S2  1
P1  D1  T1  M2  S1  2
P1  D1  T1  M2  S2  3
P1  D1  T1  M3  S1  4
P1  D1  T1  M3  S2  5
P2  D2  T2  M1  S1  6
P2  D2  T2  M1  S2  7
P2  D2  T2  M2  S1  8
P2  D2  T2  M2  S2  9
P2  D2  T2  M3  S1  10
P2  D2  T2  M3  S2  11
P3  D3  T3  M1  S1  12
P3  D3  T3  M1  S2  13
P3  D3  T3  M2  S1  14
P3  D3  T3  M2  S2  15
P3  D3  T3  M3  S1  16
P3  D3  T3  M3  S2  17

merging the headers would be having a single header cell with the value of M2_S1 for example, after merging the headers it's rather trivial to normalize. However one would need to split the value of those header cells, as in M2_S1 to M2 and S1.

I am expecting an answer in the form of steps, and an explanation of what is being done and why it is being done in each step.

Nae
  • 14,209
  • 7
  • 52
  • 79
  • Hi. What do "becomes partially normalized" & "like" & "merge the headers" & "split value" mean? What do they have to do with normalization? Please use enough words to explain yourself instead of using a word or two to summarize a bunch of stuff you didn't say. (That's also the way to start to find a solution.) – philipxy Mar 09 '18 at 09:21
  • 1
    Please [use text, not images/links, for text (including code, tables & ERDs)](https://meta.stackoverflow.com/q/285551/3404097). Use an image only for convenience to supplement text and/or for what cannot be given in text. – philipxy Mar 09 '18 at 09:24
  • What exact form is the input & output & transformation to be in? SQL? Give as much as possible of a [mcve]. (Cut & pasteable.) Including any parts you can do. – philipxy Mar 09 '18 at 10:01
  • @philipxy Output form doesn't matter, I should only be able have 7 fields with values starting from 2nd row. I don't know how I could provide copy/pasteable [pentaho-spoon] schema, XML? – Nae Mar 09 '18 at 10:15
  • I am just asking how you intend/imagine/expect someone to answer your question when you ask "how to"? What form do you actually *have* the input in *now*? Do you want GUI actions, SQL, script? PS Do not clarify in comments, edit your question to be the best presentation possible. – philipxy Mar 09 '18 at 10:28

0 Answers0