I am importing error log flat files into my SQL server, and need to parse a tab-delimited column into multiple columns. Borrowing heavily from this question (SQL Split Tab Delimited Column), specifically the answer from @Lobo, I want to accomplish a couple of things:
- capture the maximum amount of columns created using STRING_SPLIT() on [Column 0], so that I can create dynamic PIVOTs, in the eventuality that there are more columns than what I am currently aware of
- properly combine the parsed [Column 0] columns with the rest of the
record
The first goal (dynamic column count) I can live without for the time being, but it's the second goal I am having issues with.
DECLARE @SAMPLE_TABLE table(
[Column 0] nvarchar(4000),
[Filename] nvarchar(260),
FileExtention varchar(255),
DateTimeStamp datetime,
CustomerNumber varchar(255),
FileType varchar(255),
ImportSetNumber varchar(255)
)
Some sample data for the table:
[Column 0] | [Filename] | FileExtention | DateTimeStamp | CustomerNumber | FileType | ImportSetNumber
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1<tab>Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker.<tab><tab>| E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001
1<tab>Import Set No (A): 03300001: General Error: This Record and its related Records failed validation.<tab>0<tab>218 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001
1<tab>Import Set No (A): 04040186: General Error: This Record and its related Records failed validation.<tab>0<tab>17 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186
Let the games begin
;WITH
CTE_Columns AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) 'MyRowID',
[Filename],
FileExtention,
DateTimeStamp,
CustomerNumber,
FileType,
ImportSetNumber,
A.ColID 'ColumnNumber',
A.Cols 'ColumnValue'
FROM @SAMPLE_TABLE
CROSS APPLY (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ColID,
value [Cols]
FROM STRING_SPLIT([Column 0], CHAR(9)) -- split by tab character
)A
)
SELECT MyRowID,
[Filename],
FileExtention,
DateTimeStamp,
CustomerNumber,
FileType,
ImportSetNumber,
NULLIF(TRIM([1]), '') 'FirstColumn',
NULLIF(TRIM([2]), '') 'SecondColumn',
NULLIF(TRIM([3]), '') 'ThirdColumn',
NULLIF(TRIM([4]), '') 'FourthColumn'
FROM (
SELECT MyRowID,
[Filename],
FileExtention,
DateTimeStamp,
CustomerNumber,
FileType,
ImportSetNumber,
ColumnNumber,
ColumnValue
FROM CTE_Columns
)Q
PIVOT(MAX(Q.ColumnValue) FOR ColumnNumber IN([1], [2], [3], [4])) PIV
ORDER BY CustomerNumber,
ImportSetNumber
This query produces the following resultset:
MyRowID | Filename | FileExtention | DateTimeStamp | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn | ThirdColumn | FourthColumn
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | 1 | NULL | NULL | NULL
2 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL | NULL
3 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | NULL | NULL | NULL
4 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | NULL | NULL | NULL
5 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | 1 | NULL | NULL | NULL
6 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation. | NULL | NULL
7 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | NULL | 0 | NULL
8 | E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | NULL | NULL | NULL | 218
9 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | 1 | NULL | NULL | NULL
10 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | NULL | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation. | NULL | NULL
11 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | NULL | NULL | 0 | NULL
12 | E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | NULL | NULL | NULL | 17
Based on the resultset above, rows 1- 4 should be one record, rows 5 - 8 should be a second record, and rows 9 - 12 should be a third record, giving me my desired end-state of:
Filename | FileExtention | DateTimeStamp | CustomerNumber | FileType | ImportSetNumber | FirstColumn | SecondColumn | ThirdColumn | FourthColumn
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | 1 | Import Set No (A): 03300001: Contact ID (G): Invalid contact ID for this customer and company....Taker (I): Invalid taker. | NULL | NULL
E:\path\to\files\Errors\SO_OHF_10047_20230330113636_03300001.err | err | 2023-03-30 11:36:36.000 | 10047 | OHF | 03300001 | 1 | Import Set No (A): 03300001: General Error: This Record and its related Records failed validation. | 0 | 218
E:\path\to\files\Errors\SO_OHF_18120_20230404084926_04040186.err | err | 2023-04-04 08:49:26.000 | 18120 | OHF | 04040186 | 1 | Import Set No (A): 04040186: General Error: This Record and its related Records failed validation. | 0 | 17
I believe this would just be a simple matter of correct grouping, but I'm not sure what to group on, or where to properly put the grouping (whether it would be a partitioning or a standard GROUP BY clause somewhere)