3

I need to parse a structured file (FIX protocol 4.4) in powershell. The structure is like this

20220606-21:10:21.930 : 8=FIX.4.49=209 35=W34=35 49=FIXDIRECT.FT 52=20220606-21:10:21.925 56=MM_EUR_FIX_QS 55=US30 262=96 268=2 269=0 270=32921.6 271=2000000 299=16ynjsz-16ynjsz5qCaA 269=1 270=32931.4 271=2000000 299=16ynjsz-16ynjsz5qCaA 10=048

I need to pick only specific values following tags. I need the first value (timestamp) until the colon which does not have a tag number but then need to pick values following specific tag numbers. For example tag values 55, 270 and 271 (multiple 270 and 271 values exist here)

I am able to parse utilizing a simple ordered method of " " and "=" as delimiters

$contents = Get-Content FIX.log
foreach($line in $contents) {
    $s = $line.split("= ")
    write-host $s[0] $s[17] $s[25] $s[27] $s[33] $s[35]
}

however I prefer to be able to pinpoint the value using the tag numbers as there are some lines in the file that do not conform to the same content.

Result should be something like this

20220606-21:10:21.930 US30 32921.6 2000000 32931.4 2000000

mklement0
  • 382,024
  • 64
  • 607
  • 775
mart
  • 49
  • 5

2 Answers2

4

Combine -split, -match, and -replace as follows:

  • Note: The sample data below, in line with the FIX protocol, uses the SOH control character (code point 0x1) to separate the fields.
# The separator char (SOH) used by the FIX protocol / file format.
$sep = [char] 0x1

# Sample line that simulates your Get-Content call.
$content = "20220606-21:10:21.930${sep}:${sep}8=FIX.4.49=209${sep}35=W34=35${sep}49=FIXDIRECT.FT${sep}52=20220606-21:10:21.925${sep}56=MM_EUR_FIX_QS${sep}55=US30${sep}262=96${sep}268=2${sep}269=0${sep}270=32921.6${sep}271=2000000${sep}299=16ynjsz-16ynjsz5qCaA${sep}269=1${sep}270=32931.4${sep}271=2000000${sep}299=16ynjsz-16ynjsz5qCaA 10=048"

foreach ($line in $content) {

  # Split into fields
  $first, [array] $rest = $line -split $sep

  # Extract the tokens of interest:
  #  * Use the first one as-is
  #  * Among the remaining ones, use -match to filter in only
  #    those with the tag numbers of interest, then use -replace
  #    on the results to strip the tag number plus the separator ("=")
  #    from each.
  $tokensOfInterest =
    , $first + (($rest -match '^(?:55|270|271)=') -replace '^.+=')

  # Output the resulting array as a single-line, space-delimited
  # list, which is how Write-Host stringifies arrays.
  # Note: Do NOT use Write-Host to output *data*.
  Write-Host $tokensOfInterest

}

This yields the sample output in your question, namely:

20220606-21:10:21.930 US30 32921.6 2000000 32931.4 2000000
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • it appears my file. maybe because it's ansi vs unicode or whatever does not parse properly. I get this 20220606-21:10:21.930 False 20220606-21:10:21.962 False 20220606-21:10:21.973 False it's a mystery to me but could be because of ANSI vs Unicode? here is the file https://mega.nz/file/bbpHQSwL#Fdff_6UmfQDcnkzGsY-yQ2wNyJA36DJ0dDNynNpszjE – mart Jun 25 '22 at 22:38
  • wow. thank you so much! works! how can you tell? you open file in hex editor? – mart Jun 25 '22 at 23:08
  • 1
    @mart: I opened the file in Visual Studio Code, which visualizes `0x1` characters as a single glyph that reads "SOH" (a symbolic name for that char.) – mklement0 Jun 25 '22 at 23:12
  • I would have never thought of that :) thank you! by the way is this an efficient way of writing the data to file. $Logfile = "c:\news\bin\FixLogs\output.log" Function LogWrite { Param ([string]$logstring) Add-content $Logfile -value $logstring } then use LogWrite instead of Write-Host? – mart Jun 25 '22 at 23:14
  • @mart, that will stringify the tokens in the same way that `Write-Host` does, yes. As for general efficiency: using an `Add-Content` call for each and every entry is _not_ efficient, because it involves opening and closing the file ever time. However, that is a subject for another question. – mklement0 Jun 25 '22 at 23:24
  • thank you. only remaining problem is I am processing gigs of data and powershell by default is single threaded but i guess there is no easy way around this since it involves writing sequentially. – mart Jun 26 '22 at 01:26
  • 3
    @mart, this answer addresses your question as asked. While a question about speeding up PowerShell's processing is well worth asking, it should be a _new_ question post. – mklement0 Jun 26 '22 at 01:31
2

Here is another take on the problem, using the .NET Regex class.

$contents = Get-Content FIX.log

# Tags to search for, separated by RegEx alternation operator
$tagsPattern = '55|270|271'

foreach($line in $contents) {
    # Extract the datetime field
    $dateTime = [regex]::Match( $line, '^\d{8}-\d{2}:\d{2}:\d{2}\.\d{3}' ).Value
    
    # Extract the desired tag values
    $tagValues = [regex]::Matches( $line, "(?<= (?:$tagsPattern)=)[^ ]+" ).Value

    # Output everything
    Write-Host $dateTime $tagValues
}
  • The [regex]::Match() method matches the first instance of the given pattern and returns a single Match object, whose Value property contains the matched value.
  • The [regex]::Matches() method finds all matches of the pattern. It returns a collection of Match objects. With the aid of PowerShell's convenient member access enumeration feature, we directly create an array of all Value properties.
  • Explanation and demos of the RegEx patterns at regex101.com:
zett42
  • 25,437
  • 3
  • 35
  • 72