0

I am importing error log flat files into my SQL server, and need to parse a tab-delimited column into multiple columns. Borrowing heavily from this question (SQL Split Tab Delimited Column), specifically the answer from @Lobo, I want to accomplish a couple of things:

  • capture the maximum amount of columns created using STRING_SPLIT() on [Column 0], so that I can create dynamic PIVOTs, in the eventuality that there are more columns than what I am currently aware of
  • properly combine the parsed [Column 0] columns with the rest of the
    record

The first goal (dynamic column count) I can live without for the time being, but it's the second goal I am having issues with.

DECLARE @SAMPLE_TABLE table(
    [Column 0] nvarchar(4000),
    [Filename] nvarchar(260),
    FileExtention varchar(255),
    DateTimeStamp datetime,
    CustomerNumber varchar(255),
    FileType varchar(255),
    ImportSetNumber varchar(255)
)

Some sample data for the table:

[Column 0]                                                                                                                                | [Filename]                                                       | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1<tab>Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker.<tab><tab>| E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001
1<tab>Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.<tab>0<tab>218                    | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001
1<tab>Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.<tab>0<tab>17                     | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186

Let the games begin

;WITH
CTE_Columns AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) 'MyRowID',
            [Filename],
            FileExtention,
            DateTimeStamp,
            CustomerNumber,
            FileType,
            ImportSetNumber,
            A.ColID 'ColumnNumber',
            A.Cols 'ColumnValue'
    FROM @SAMPLE_TABLE
    CROSS APPLY (
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColID,
                value [Cols]
                FROM STRING_SPLIT([Column 0], CHAR(9))  -- split by tab character
    )A
)

SELECT MyRowID,
        [Filename],
        FileExtention,
        DateTimeStamp,
        CustomerNumber,
        FileType,
        ImportSetNumber,
        NULLIF(TRIM([1]), '') 'FirstColumn',
        NULLIF(TRIM([2]), '') 'SecondColumn',
        NULLIF(TRIM([3]), '') 'ThirdColumn',
        NULLIF(TRIM([4]), '') 'FourthColumn'
FROM (
        SELECT MyRowID,
                [Filename],
                FileExtention,
                DateTimeStamp,
                CustomerNumber,
                FileType,
                ImportSetNumber,
                ColumnNumber,
                ColumnValue
        FROM CTE_Columns
)Q
PIVOT(MAX(Q.ColumnValue) FOR ColumnNumber IN([1], [2], [3], [4])) PIV
ORDER BY CustomerNumber,
            ImportSetNumber

This query produces the following resultset:

MyRowID | Filename                                                         | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn                                                                                                               | ThirdColumn | FourthColumn
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | NULL                                                                                                                       | NULL        | NULL
2       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL        | NULL
3       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | NULL
4       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | NULL
5       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | NULL                                                                                                                       | NULL        | NULL
6       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.                         | NULL        | NULL
7       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | 0           | NULL
8       | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | NULL        | NULL                                                                                                                       | NULL        | 218
9       | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | 1           | NULL                                                                                                                       | NULL        | NULL
10      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.                         | NULL        | NULL
11      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | NULL                                                                                                                       | 0           | NULL
12      | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | NULL        | NULL                                                                                                                       | NULL        | 17

Based on the resultset above, rows 1- 4 should be one record, rows 5 - 8 should be a second record, and rows 9 - 12 should be a third record, giving me my desired end-state of:

Filename                                                         | FileExtention | DateTimeStamp           | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn                                                                                                               | ThirdColumn | FourthColumn
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL        | NULL
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err           | 2023-03-30 11:36:36.000 | 10047          | OHF      | 03300001        | 1           | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.                         | 0           | 218
E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err           | 2023-04-04 08:49:26.000 | 18120          | OHF      | 04040186        | 1           | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.                         | 0           | 17

I believe this would just be a simple matter of correct grouping, but I'm not sure what to group on, or where to properly put the grouping (whether it would be a partitioning or a standard GROUP BY clause somewhere)

Kulstad
  • 79
  • 8
  • Why dont't you just use a proper bulk import system, such as `bcp.exe` or `BULK INSERT` or `SqlBulkCopy` rather than rolling your own? – Charlieface May 19 '23 at 17:07
  • check SQL Server LEAD() function. it might help to read next offecet lines, than will select first rows. – Power Mouse May 22 '23 at 18:29
  • in addition, and this link would be valuable for you: https://stackoverflow.com/questions/14783680/sql-pivot-and-string-concatenation-aggregate – Power Mouse May 22 '23 at 19:11

1 Answers1

0

You need to regroup the rows in a correct way and combine the parsed columns by adding a new column which can act as a grouping identifier :

;WITH CTE_Columns AS (
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS MyRowID,
        [Filename],
        FileExtention,
        DateTimeStamp,
        CustomerNumber,
        FileType,
        ImportSetNumber,
        A.ColID AS ColumnNumber,
        A.Cols AS ColumnValue
    FROM @SAMPLE_TABLE
    CROSS APPLY (
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColID,
            value AS Cols
            FROM STRING_SPLIT([Column 0], CHAR(9)) -- split by tab character
    ) A
), CTE_Grouping AS (
    SELECT MyRowID,
        [Filename],
        FileExtention,
        DateTimeStamp,
        CustomerNumber,
        FileType,
        ImportSetNumber,
        MAX(CASE WHEN ColumnNumber = 1 THEN ColumnValue END) AS FirstColumn,
        MAX(CASE WHEN ColumnNumber = 2 THEN ColumnValue END) AS SecondColumn,
        MAX(CASE WHEN ColumnNumber = 3 THEN ColumnValue END) AS ThirdColumn,
        MAX(CASE WHEN ColumnNumber = 4 THEN ColumnValue END) AS FourthColumn
    FROM CTE_Columns
    GROUP BY MyRowID, [Filename], FileExtention, DateTimeStamp, CustomerNumber, FileType, ImportSetNumber
)
SELECT [Filename],
    FileExtention,
    DateTimeStamp,
    CustomerNumber,
    FileType,
    ImportSetNumber,
    FirstColumn,
    SecondColumn,
    ThirdColumn,
    FourthColumn
FROM CTE_Grouping
ORDER BY CustomerNumber, ImportSetNumber

Inside the CTE_Grouping, the MAX function is used with conditional statement to combine the parsed columns ColumnValue based on their corresponding column numbers ColumnNumber. This will ensure that each grouping has only one row with the combined values.

The outer query selects the desired columns from the CTE_Grouping and orders the result by CustomerNumber and ImportSetNumber.

Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60