I have the below code to convert an LDIF file (over 100.000 lines) to a CSV file (over 4.000 lines), but I'm not sure I'm happy with the time it takes - although I don't know how long it should take really; maybe that's a normal time on my laptop (Core i5 7th Gen, 16GB RAM, SSD drive)?
Would there be any room for improvement? (especially for the parsing if possible, which takes 30 seconds)
# Reducing & editing data to process:
# -----------------------------------
$original = Get-Content $IN_ldif_file
$reduced = (($original | select-string -pattern '^cust[A-Z]','^$' -CaseSensitive).Line) -replace ':: ', ': ' -replace '^cust',''
"Writing reduced LDIF file..." # < 1 sec
(Measure-Command { Set-Content $reducedLDIF -Value $reduced -Encoding UTF8 }).TotalSeconds
# Parsing the relevant data:
# --------------------------
$inData = New-Object -TypeName System.IO.StreamReader -ArgumentList $reducedLDIF
$a = @{} # initialize the temporary hash
$lineNum = $rcdNum = 0 # initialize the counters
"Parsing reduced LDIF file..." # 27-36 sec
(Measure-Command {
# Begin reading and processing the input file:
$results = while (-not $inData.EndOfStream)
{
$line = $inData.ReadLine()
Write-Verbose "$("{0:D4}" -f ++$lineNum)|$("{0:D4}|" -f $rcdNum)$line"
if (($line -match "^\s*$") -or $inData.EndOfStream )
{
# blank line or end of stream - dump the hash as an object and reinit the hash
[PSCustomObject]$a
$a = @{}
$rcdNum++
} else {
# build up hash table for the object
$key, $value = $line -split ": "
$a[$key] = $value
}
}
$inData.Close()
}).TotalSeconds
# Populating & writing the CSV file:
# ----------------------------------
"Populating the CSV data..." # 7-11 sec
(Measure-Command {
$out = $results |
select "Attribute01",
"Attribute02",
"Attribute03",
<# etc... #>
@{n="Attribute39"; E={$_."Attribute20"}}, # Attribute39 (not in LDIF) takes value of Attribute20
"Attribute40"
}).TotalSeconds
"Writing CSV file..." # < 1 sec
(Measure-Command { $out | Export-CSV $OUT_csv_file -NoTypeInformation }).TotalSeconds
Note: I actually don't need to export the "$reduced" data to a file (e.g. "$reducedLDIF"), but the piece of code I found for the parsing seems to require a file.
Thanks!
structure where you will incur the load of data in RAM. Even so, because it's all in RAM and not reading from disk (even considering your SSD), your performance gains could be significantly higher. I don't understand 'LDAP netflow'. Are you saying you can't query TCP 389 from your machine to the target LDAP server?
– thepip3r Mar 24 '20 at 15:17