2

I need to build a process that includes reading a CSV file into a SQL Server table. In the past I’ve used SQL Bulk Insert, but the CSV files I’m working with will have commas within some fields (unpredictably) and my sql server version (2016) does not support handing those (I gather that with 2017, the FIELDQUOTE parameter addresses this).

This post talks about this very issue, and turned me on to this method of using Powershell with Import-CSV. My dataset with each run will be very small, so the “Simple CSV Import Using PowerShell” should be fine. However I’m getting hung up on data types.

When I use this method below, I get errors because it can’t convert string to decimal or datatime. However if I configure my sql table up to only contain string type fields, the data imports correctly. It seems to ignore the header row. I understand that Import-CSV will use the first row as headers by default, but what I get in my destination table (when all the fields are set to varchar(n)) is only rows 2 and beyond.

Is there a way for me to use this method but set the fields to their proper formats so they can go into the properly defined sql table? If I had to, I could work with it as I have it working (all destination columns set to varchar(n)), but I’d prefer that the sql columns were properly defined. My CSV header names are predictable, and I know what I want each column data type to be. Many will be left as a string, but some need to be changed to decimal(19,4) or datetime. I think I’m reading in other places where I can pipe in a where-object command or something, but I’m a PS novice and struggling to find the proper syntax.

$ConnectionString = "Data Source=<myServer>; Database=<myDatabase>;Trusted_Connection=True;";
. "E:\Out-DataTable.ps1"

$csvDataTable = Import-CSV -Path "<sourceCsvFile>" | Out-DataTable
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$bulkCopy.DestinationTableName = "<myDestinationTable>"
$bulkCopy.WriteToServer($csvDataTable)
Tim Miller
  • 31
  • 1
  • 4

1 Answers1

2

Regarding your main question: "Is there a way for me to use this method but set the fields to their proper formats so they can go into the properly defined sql table?" Yes.

You could cast the values to their desired types before creating the DataTable (Example code below). Personally, I'd probably try to replace the CSV files with something that encodes type information, such as JSON or parquet.


UPDATE BASED ON OP'S QUESTIONS:

Would I have to spell out each field in that loop, or just the ones that need to change from string to something else? You'd have to map each field to the [PSCustomObject], but you only need to cast the fields containing data intended to be non-string.

For column headers that have spaces, do I surround them in brackets when referring to them in PS (such as [Created on Date]) I adjusted the answer to illustrate one way you could handle column headers with spaces. See column 'e e'.

I see your example converts 0.2 type numbers to "double". My decimal data will be currency and will work with other data such that I would be setting it to decimal. Do you expect this will flow properly, or is there a different proper syntax for converting to decimal? Based on the reference below, .NET's decimal type should map to SQL Server's decimal type. See property f in the example data. https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql-server-data-type-mappings



Assuming you have the below data in a CSV

a,b,c,d,e e,f
string1,1,0.1,2020-05-02 14:29:06.780,e,1.99
string2,2,0.2,2020-05-03 01:01:02.000,e,3.59


You could cast the value to the desired types using an approach similar to the one in the foreach loop below. Obviously, you'd have to customize that to fit your actual data.

param(
    $datasource = 'localhost',
    $db = 'test',
    $destTable = 'table',
    $csvPath = "$PSScriptRoot\data.csv"
)

$ConnectionString = "Data Source=$datasource; Database=$db;Trusted_Connection=True;";
. "$PSScriptRoot\Out-DataTable.ps1"

$rawData = Import-CSV -Path $csvPath
$typedData = @()
foreach ($row in $rawData) {
    $tempObj = [PSCustomObject] @{
        a = $row.a
        b = [int] $row.b
        c = [double] $row.c
        d = [datetime] $row.d
        e_e = $row."e e"
        f = [decimal] $row.f
    }
    $typedData += $tempObj
}

$csvDataTable = $typedData | Out-DataTable
$bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $ConnectionString
$bulkCopy.DestinationTableName = $destTable
$bulkCopy.WriteToServer($csvDataTable)

Contents of Out-DataTable.ps1 (for future readers)

####################### 
function Get-Type { 
    param($type) 

    $types = @( 
        'System.Boolean', 
        'System.Byte[]', 
        'System.Byte', 
        'System.Char', 
        'System.Datetime', 
        'System.Decimal', 
        'System.Double', 
        'System.Guid', 
        'System.Int16', 
        'System.Int32', 
        'System.Int64', 
        'System.Single', 
        'System.UInt16', 
        'System.UInt32', 
        'System.UInt64') 

    if ( $types -contains $type ) { 
        Write-Output "$type" 
    } 
    else { 
        Write-Output 'System.String' 

    } 
} #Get-Type 

####################### 
<# 
.SYNOPSIS 
Creates a DataTable for an object 
.DESCRIPTION 
Creates a DataTable based on an objects properties. 
.INPUTS 
Object 
    Any object can be piped to Out-DataTable 
.OUTPUTS 
   System.Data.DataTable 
.EXAMPLE 
$dt = Get-psdrive| Out-DataTable 
This example creates a DataTable from the properties of Get-psdrive and assigns output to $dt variable 
.NOTES 
Adapted from script by Marc van Orsouw see link 
Version History 
v1.0  - Chad Miller - Initial Release 
v1.1  - Chad Miller - Fixed Issue with Properties 
v1.2  - Chad Miller - Added setting column datatype by property as suggested by emp0 
v1.3  - Chad Miller - Corrected issue with setting datatype on empty properties 
v1.4  - Chad Miller - Corrected issue with DBNull 
v1.5  - Chad Miller - Updated example 
v1.6  - Chad Miller - Added column datatype logic with default to string 
v1.7 - Chad Miller - Fixed issue with IsArray 
.LINK 
http://thepowershellguy.com/blogs/posh/archive/2007/01/21/powershell-gui-scripblock-monitor-script.aspx 
#> 
function Out-DataTable { 
    [CmdletBinding()] 
    param([Parameter(Position = 0, Mandatory = $true, ValueFromPipeline = $true)] [PSObject[]]$InputObject) 

    Begin { 
        $dt = new-object Data.datatable   
        $First = $true  
    } 
    Process { 
        foreach ($object in $InputObject) { 
            $DR = $DT.NewRow()   
            foreach ($property in $object.PsObject.get_properties()) {   
                if ($first) {   
                    $Col = new-object Data.DataColumn   
                    $Col.ColumnName = $property.Name.ToString()   
                    if ($property.value) { 
                        if ($property.value -isnot [System.DBNull]) { 
                            $Col.DataType = [System.Type]::GetType("$(Get-Type $property.TypeNameOfValue)") 
                        } 
                    } 
                    $DT.Columns.Add($Col) 
                }   
                if ($property.Gettype().IsArray) { 
                    $DR.Item($property.Name) = $property.value | ConvertTo-XML -AS String -NoTypeInformation -Depth 1 
                }   
                else { 
                    $DR.Item($property.Name) = $property.value 
                } 
            }   
            $DT.Rows.Add($DR)   
            $First = $false 
        } 
    }  

    End { 
        Write-Output @(, ($dt)) 
    } 

} #Out-DataTable
derekbaker783
  • 8,109
  • 4
  • 36
  • 50
  • Note that using += to iteratively build arrays is inefficient, and will be expensive with progressively larger datasets. – derekbaker783 May 02 '20 at 18:52
  • Derek, would I have to spell out each field in that loop, or just the ones that need to change from string to something else (they start as strings by default using the Import-Csv). For column headers that have spaces, do I surround them in brackets when referring to them in PS (such as [Created on Date]). Finally, I see your example converts 0.2 type numbers to "double". My decimal data will be currency and will work with other data such that I would be setting it to decimal. Do you expect this will flow properly, or is there a different proper syntax for converting to decimal? Thanks! – Tim Miller May 03 '20 at 17:41
  • @TimMiller, I updated the answer in response to the questions in your most recent comment. – derekbaker783 May 03 '20 at 18:53
  • Thanks for this. Several of my datetime fields will be blank for many of the rows, and this causes a problem. (Cannot convert value "" to type "System.DateTime". Error: "String was not recognized as a valid DateTime.") Received several responses like that. That will also be true for many of the decimal fields, but I did not receive any errors relating to converting to decimal. After several of those, I see many of these, which I assume are only a result of the first problem (Out-DataTable : Cannot bind argument to parameter 'InputObject' because it is null.). – Tim Miller May 05 '20 at 17:22
  • @TimMiller, you could do something like this to handle that condition: `d = if ($null -eq $row.d -or $row.d -eq '') { Get-Date -Year 0001 -Month 01 -Day 01 } else { [datetime] $row.d }` You'll probably want to update the params passed to `Get-Date` to handle the missing values. Powershell 5.1 and under doesn't have a proper ternary operation, but this is basically the same. – derekbaker783 May 05 '20 at 17:57
  • That did the job perfectly. Thanks very much for your help!! – Tim Miller May 05 '20 at 18:49