I am reading a flat file (fixed width fields) so that I can import it into SQLServer. This is a new project in my company. I have not yet worked on parsing huge files in the most time-efficient and quick way. Scouring on the Internet, I have found a powershell script which can parse a fixed-width file... It does the job of parsing the file.
[void][System.Reflection.Assembly]::LoadWithPartialName('Microsoft.VisualBasic')
$Parser = New-Object Microsoft.VisualBasic.FileIO.TextFieldParser(
'C:\T1046436.dat')
$Parser.TextFieldType = 'FixedWidth'
$Parser.TrimWhiteSpace = $False
$Parser.FieldWidths = @(13,6,11,2,10,10,1,1,7,7,2,17,1,2,2,4,1,10,10,10,10,2,10,10,11,2,2,1,1)
while(!$Parser.EndOfData)
{
try
{
Write-Host $Parser.ReadFields()
}
catch [Microsoft.VisualBasic.FileIO.MalformedLineException]
{
Write-Host "Error, line $($_.Exception.LineNumber): $($Parser.ErrorLine)"
}
}
I want to be able to save this in a pipe-delimited file, so that I can simply BCP IN the sql database. Thats my quick-dirty way to load data. But, it is using lots of time (20 minutes for 50000 records).
Any suggestions to get it quickly/ efficiently:
- converted into pipe-delimited file OR
- directly import the flat file into SQL server from powershell... OR
- I am open to use any other scripting language which can help parse the flat-file quickly into a pipe-delimited file (python, perl etc) under Windows. Any example scripts to use will be appreciated.