I am attempting to use Sylvan.Data.Csv.dll in PowerShell to load multi-GB CSV files into SQL Server and have run into an issue around using the SQL Server destination table for the column schema source. Hoping someone can see what I'm missing and point me down the right path.
The goal is to use the column schema from SQL Server destination table to cut down on data typing issues with SQLBULKCOPY and because there are some challengs with the CSV files.
The samples below demo where I've tried 2 different approaches. In both instances, the challenge is getting CsvDataReaderOptions to accept the schema data ($readerOptions.Schema = [Sylvan.Data.Csv.CsvSchema]::new($tableSchema)
). In both, the schema entry is created but it is empty. By that, the CsvDataReader.Schema value changes from $null to "[CsvSchema]" but [CsvSchema] itself is $null.
For background, my hands are tied in that the solution has to use PowerShell 7 and the CSV file content is generated externally so there is limited control over it's content.
The CSV files large (multi-GB) files that are slightly unique in that the first row in the file does not contain column headers. Instead, it contains information about the file (source, function, create date, etc.). This row needs to be skipped. The 2nd row on is variable length, comma delimited data. An overly simplified snip is included below.
General challenges with the files are embedded in a quoted text field, date/datetimestamp formatting issues, and missing trailing delimiters (commas) on some records.
"A","1A","SOURCE-LOCALE",20020213,20020214,181501,,,,,
"01","20230101",1,"Some user name","Locale Info
with inline CRLF",2001-12-13,45821.41,,,,23,,,,
"02","20230102",1,"Don Draper","I'm mad",2023-12-13,45821.41,,,,,,,,
"03","20230103",1,"Shaggy","Zoinks",2023-12-13,45821.41,,,,,,,,
"04","20230104",1,"Woody","Forrest",2023-12-13,45821.41
"05","20230105",1,"Acey","Deucy",2023-12-13,45821.41,,,,,,"Well"
"05","20230106",1,"ORANGE COUNTY\
ACCOUNTS\ESSEX\CITY","Woof",2023-12-13,45821.41,,,,,,"Well"
Thanks in advance for any assistance.
APPROACH 1 (Sylvan.Data.Csv only):
Test code is included below. Everything is good until ($readerOptions.Schema = [Sylvan.Data.Csv.CsvSchema]::new($tableSchema)
). The CsvDataReader.Schema value changes from $null to "[CsvSchema]" but [CsvSchema] itself is $null. Screenshots show debugs for the $tableSchema
, $readerOptions
, and $readerOptions.Schema
objects.
I tried passing the $tableSchema
object to $readerOptions.Schema
directly ($readerOptions.Schema = $tableSchema
) and was greeted with an error that helped in that it confirms the $tableSchema
object type.
$readerOptions.Schema = $tableSchema | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Exception setting "Schema": "Cannot convert the | "System.Collections.ObjectModel.ReadOnlyCollection
1[System.Data.Common.DbColumn]" value of type | "System.Collections.ObjectModel.ReadOnlyCollection
1[[System.Data.Common.DbColumn, System.Data.Common, Version=7.0.0.0,
Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]]" to type "Sylvan.Data.Csv.ICsvSchemaProvider"."
After trying a host of variations, I thought maybe it using Sylvan.Data Builder might offer a different outcome.
# APPROACH - 1 TEST
$ModuleDir = "C:\TEST\Modules"
Add-Type -AssemblyName System.Data
$ddlToAdd = "Microsoft.Data.SqlClient.dll"
Add-Type -Path (Join-Path $ModuleDir $ddlToAdd) -ReferencedAssemblies Microsoft.Data.SqlClient.SNI
$ddlToAdd = "Sylvan.Data.Csv.dll"
Add-Type -Path (Join-Path $ModuleDir $ddlToAdd) -ErrorAction Stop
$SQLConnection = [Microsoft.Data.SqlClient.SqlConnection]::new()
$sql = [Microsoft.Data.SqlClient.SqlCommand]::new()
$Database = "TEST"
$Schema = "DBO"
$Table = "TEST1"
$sql = "SELECT TOP 0 * FROM $Database.$Schema.$Table;"
$SQLConnection.ConnectionString = "Data Source=<-- a working conn string -->"
$SQLConnection.Open()
$sqlcmd = [Microsoft.Data.SqlClient.SqlCommand]::new($sql, $SQLConnection)
$reader = $sqlCmd.ExecuteReader()
$tableSchema = $reader.GetColumnSchema()
$SQLConnection.Close()
$file = "C:\TEST\file1.csv"
$Encoding = 'UTF8'
[char]$Delimiter = ","
[char]$Quote = '"'
$readerOptions = [Sylvan.Data.Csv.CsvDataReaderOptions]::new()
$readerOptions.HasHeaders = $true
$readerOptions.Quote = $Quote
$readerOptions.Delimiter = $Delimiter
$readerOptions.Escape = '"'
$readerOptions.Comment = '#'
$readerOptions.Schema = [Sylvan.Data.Csv.CsvSchema]::new($tableSchema)
$reader = [Sylvan.Data.Csv.CsvDataReader]::Create($file, $readerOptions)
APPROACH 2 (Sylvan.Data.Csv w/Sylvan.Data Builder):
Test code is included below. I bit longer but unfortunately the same outcome. The CsvDataReader.Schema value changes from $null to "[CsvSchema]" but [CsvSchema] itself is $null
# APPROACH - 2 TEST
$ModuleDir = "C:\TEST\Modules"
Add-Type -AssemblyName System.Data
$ddlToAdd = "Microsoft.Data.SqlClient.dll"
Add-Type -Path (Join-Path $ModuleDir $ddlToAdd) -ReferencedAssemblies Microsoft.Data.SqlClient.SNI
$ddlToAdd = "Sylvan.Data.dll"
Add-Type -Path (Join-Path $ModuleDir $ddlToAdd) -ErrorAction Stop
$ddlToAdd = "Sylvan.Data.Csv.dll"
Add-Type -Path (Join-Path $ModuleDir $ddlToAdd) -ErrorAction Stop
$SQLConnection = [Microsoft.Data.SqlClient.SqlConnection]::new()
$sql = [Microsoft.Data.SqlClient.SqlCommand]::new()
$Database = "TEST"
$Schema = "DBO"
$Table = "TEST1"
$sql = "SELECT TOP 0 * FROM $Database.$Schema.$Table;"
$SQLConnection.ConnectionString = "Data Source=<-- a working conn string -->"
$SQLConnection.Open()
$sqlcmd = [Microsoft.Data.SqlClient.SqlCommand]::new($sql, $SQLConnection)
$reader = $sqlCmd.ExecuteReader()
$tableSchema = $reader.GetColumnSchema()
$SQLConnection.Close()
$schemaBuilder = [Sylvan.Data.Schema+Builder]::new()
foreach ($r in $tableSchema) {
$columnBuilder = [Sylvan.Data.Schema+Column+Builder]::new()
$columnBuilder.ColumnOrdinal = $r.ColumnOrdinal
$columnBuilder.ColumnSize = $r.ColumnSize
$columnBuilder.ColumnName = $r.ColumnName
$columnBuilder.DataType = $r.DataType
$columnBuilder.DataTypeName = $r.DataTypeName
$columnBuilder.AllowDBNull = $r.AllowDBNull
$columnBuilder.BaseCatalogName = $r.BaseCatalogName
$columnBuilder.BaseColumnName = $r.BaseColumnName
$columnBuilder.BaseSchemaName = $r.BaseSchemaName
$columnBuilder.BaseServerName = $r.BaseServerName
$columnBuilder.BaseTableName = $r.BaseTableName
$columnBuilder.IsAliased = $r.IsAliased
$columnBuilder.IsAutoIncrement = $r.IsAutoIncrement
$columnBuilder.IsExpression = $r.IsExpression
$columnBuilder.IsHidden = $r.IsHidden
$columnBuilder.IsIdentity = $r.IsIdentity
$columnBuilder.IsKey = $r.IsKey
$columnBuilder.IsLong = $r.IsLong
$columnBuilder.IsReadOnly = $r.IsReadOnly
$columnBuilder.IsUnique = $r.IsUnique
$columnBuilder.NumericPrecision = $r.NumericPrecision
$columnBuilder.NumericScale = $r.NumericScale
$columnBuilder.UdtAssemblyQualifiedName = $r.UdtAssemblyQualifiedName
$column = $columnBuilder.Build()
$null = $schemaBuilder.Add($column)
}
$tblSchema = $schemaBuilder.Build()
$file = "C:\TEST\file1.csv"
$Encoding = 'UTF8'
[char]$Delimiter = ","
[char]$Quote = '"'
$readerOptions = [Sylvan.Data.Csv.CsvDataReaderOptions]::new()
$readerOptions.HasHeaders = $true
#$readerOptions.HasHeaders = $false
$readerOptions.Quote = $Quote
$readerOptions.Delimiter = $Delimiter
$readerOptions.Escape = '"'
$readerOptions.Comment = '#'
$readerOptions.Schema = [Sylvan.Data.Csv.CsvSchema]::new($tblSchema)
$reader = [Sylvan.Data.Csv.CsvDataReader]::Create($file, $readerOptions)