How to Optimize extended event to JSON conversion

Question

I have a small process for ingesting a .xel file, converting it to custom objects with a dba-tools module, and then turning them into single-line JSON and exporting them to a file that gets sent off to wherever it goes. Here:

    $path = 'C:\temp\big_xe_file.xel'

    #Read in file
    $xes = Read-DbaXEFile -Path $path

    #Output Variable
    $file = ""
    foreach ($xe in $xes) {

        #format date column
        $xe.timestamp = ($xe.timestamp.DateTime).ToString("yyyy-MM-ddThh:mm:ss.ffff")

        # convert to JSON and change escaped unicode characters back
        $xe = (($xe | ConvertTo-Json -compress)) | % {                                                                                           #| % { [System.Text.RegularExpressions.Regex]::Unescape($_) }
                                          [Regex]::Replace($_, 
                                             "\\u(?<Value>[a-zA-Z0-9]{4})", {
                                                 param($m) ([char]([int]::Parse($m.Groups['Value'].Value,
                                                     [System.Globalization.NumberStyles]::HexNumber))).ToString() } )}

        #Write line to file
        Add-Content -Value "$($xe)`n" -Path 'C:\temp\myevents.json' -Encoding utf8 -NoNewline

    }

This fits the bill and does exactly what I need it to. The nasty regex in the middle is because when you convertto-json, it HANDILY escapes all unicode characters, and the regex magically turns them all back to the characters we know and love.

However, it's all a bit too slow. We churn out lots of .xel files, usually 500mb in size, and we would like to have a shorter delay between the traces being written and being ingested. As it stands, it takes ~35 minutes to serially process a file this way. The delay would likely grow if we got behind, which seems likely at that speed.

I've already sped this up quite a bit. I've tried using [System.Text.RegularExpressions.Regex]::Unescape in place of the regex code I have, but it is only slightly faster and does not provide the correct formatting that we need anyway. My next step is to split the files into smaller pieces and process them in parallel, but that would be significantly more CPU intensive and I'd like to avoid that if possible.

Any help optimizing this is much appreciated!

May I ask why it is an issue to let the encoding process escape the unicode chars? It allows for proper decoding when something else has to read it. Converting the unicode points back into their actual chars could cause issues. — MonkeyZeus, Sep 24 '19 at 17:00
We're using the ELK stack for logging, and for some reason those escaped characters cause elastic search to throw errors when trying to ingest them. The way I have it configured in the question produces JSON that gets ingested without any errors. I don't have any influence over the ELK configuration so I've assumed that it's unavoidable. — Ben Adams, Sep 24 '19 at 17:03
Yikes, thanks for the ELK stack warning. I cannot imagine why ELK would have this issue though. I really hope they are not regexing the JSON logs but that would at least explain part of the downstream issue. — MonkeyZeus, Sep 24 '19 at 17:47
It could also be a configuration issue, I don't have enough visibility into it to say. I'm just working around what's in front of me — Ben Adams, Sep 24 '19 at 17:49

score 0 · Answer 1 · answered Sep 24 '19 at 20:00

It turns out there was a config issue and we were able to ditch that regex nonsense and leave the escape characters in the JSON. However, I did also find a solution for speeding it up, in case anyone ever sees this. The solution was changing the writer to use a .NET class instead of the powershell method

$stream = [System.IO.StreamWriter] $outfile

foreach ($xe in $xes) {

    #format date column
    $xe.timestamp = ($xe.timestamp.DateTime).ToString("yyyy-MM-ddThh:mm:ss.ffff")

    $xe | Add-Member -MemberType NoteProperty -Name 'source_host_name' -Value $server_name

    # convert to JSON and change escaped unicode characters back
    $xe = (($xe | ConvertTo-Json -compress)) #| % {                                                                                           #| % { [System.Text.RegularExpressions.Regex]::Unescape($_) }
                                             # [Regex]::Replace($_, 
                                             #    "\\u(?<Value>[a-zA-Z0-9]{4})", {
                                             #        param($m) ([char]([int]::Parse($m.Groups['Value'].Value,
                                             #            [System.Globalization.NumberStyles]::HexNumber))).ToString() } )}

    #Add-Content -Value "$($xe)`n" -Path 'C:\DBA Notes\Traces\Xel.json' -Encoding utf8 -NoNewline



    $stream.WriteLine($xe)



}

$stream.close()

It takes 1/10 the amount of time. Cheers

How to Optimize extended event to JSON conversion

1 Answers1