1

I'm trying to create a powershell script to extract all lines containing "ERROR" and its database path to the item into a huge logs txt file and sort it into a csv file. Example of error :

2022-04-17 00:00:00.9999|ERROR|texte:texte|texte \\DATABASE\Path\Path\Path\Path\Item[Item Name] (ID:########-####-####-###-############ Rank:#). description of the error. 

I would then like to recover the date and the full path to the element in error (\DATABASE\Path\Path\Path\Path\Item[Item Name]) as well as the description of the error and delete the duplicates. Also I don't know if it is possible to directly separate the date, the path and the message in three columns in the csv file.

Example of logs (screenshot) :

2022-04-17 00:00:00.9999|ERROR|ANDataCache:Configuration|################# Error when adding input attributes to data cache (Failed:8/Total:12) [99.9999999999999 ms].
2022-04-17 00:00:00.9999|ERROR|ANCalculationEngine:Configuration|Failed to initialize \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name]  (ID:########-####-####-###-############ Rank:#). Failed to resolve required input 'input A name'
Failed to resolve required input 'input B name'
No output is defined.
2022-04-17 00:00:00.9999|WARN|ANTimeClassManagerHelper:Configuration|Ignoring partial cache signup errors for \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name]  (ID:########-####-####-###-############ Rank:#). Failed to signup some input(s) for receiving updates. 
 Net Volume in Tank: Point not found 'Point Name'.
2022-04-17 00:00:00.9999|ERROR|ANCalculationEngine:Configuration|Failed to initialize \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] (ID:########-####-####-###-############ Rank:#). Failed to resolve required input 'input name'
There is no time rule configured for this analysis.
No output is defined.
2022-04-17 00:00:00.9999|WARN|###########:#########|############[#####] Ignoring attempt to remove non-existent calculation '\\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] (ID:########-####-####-###-############ Rank:#)'
2022-04-17 00:00:00.9999|ERROR|ANDataCache:Configuration|DataCache:################ Error when adding input attributes to data cache (Failed:8/Total:12) [99.9999999999999 ms]. 

Example of expected result (according to the example above)

(I just want to retrieve ERRORS with path ("\DATABASE\Path\Path\Path\Path\Item[Item Name]"), not the WARNINGS logs or the ERRORS without path)

I started writing this:

$File = "logs.txt"
$Pattern = '(\[ERROR\[^\\]+(?<DatabasePath>[^\\]]+\])(?<ErrorText>[^\r\n]+=)'
$Content = Get-Content $File
[regex]::Matches($Content, $Pattern).Value | Set-Content "output.csv" 

Or to just retrieve the path :

$File = "logs.txt"
$Pattern = '(?<=\\DATABASE\\).+?(?=])'
$Content = Get-Content $File
[regex]::Matches($Content, $Pattern).Value | Set-Content "output.csv"

But in the second case "DATABASE" does not appear in the output file.

Thank you in advance for your answers.

Heighties
  • 21
  • 3
  • 5
    Helps if you provide us an example of what the result should look like, as well as a sanitized log (*in plain text*) that we can work with. – Abraham Zinala Apr 17 '22 at 15:58
  • 2
    Please [edit] your question and add the examples in there as formatted text. In comments it is difficult to read – Theo Apr 17 '22 at 20:06
  • It seems that you have a pipe delimited file which could be read using `Import-Csv -Delimiter '|'`, why are you using `Get-Content` instead ? Or is the file not pipe delimited on all rows? – Santiago Squarzon Apr 17 '22 at 20:37
  • It's just a start. I'm not entirely comfortable with Powershell. How to ignore what is between "\\DATABASE\Path\Path\Path\Path\Item[Item Name]" and "description of the error." which is not delimited by pipes? – Heighties Apr 17 '22 at 20:59
  • The screenshot you're showing us does not resemble what we see (as plain text in the example). Preferable you would add a data sample __as plain text__ and your expect output __as plain text__ to your question as Abraham pointed out in his comment. – Santiago Squarzon Apr 17 '22 at 21:03
  • I'm not sure I understand. The screenshot is an example of the expected output. – Heighties Apr 17 '22 at 21:15
  • @Heighties - PLEASE add a few lines of sample data [sanitized as needed] and the desired result _in plain text_ to your Question. plus, wrap them both in code formatting so they are easy to read & use in testing. – Lee_Dailey Apr 17 '22 at 22:29
  • I added some things trying to be as clear as possible – Heighties Apr 17 '22 at 23:16
  • @Heighties - thank you for the sample data. [*grin*] **_are there extra line ends in that sample?_** i thot each line would start with a timestamp, but you have lines that have other starting chars. – Lee_Dailey Apr 18 '22 at 00:49

1 Answers1

0

The regex is likely to be improved but for the time being this might help you get what you're looking for. I encourage you to check this regex101 link to test the current regex (and maybe improve it) if there is something not working.

$re = [regex]"(?m)(?<date>^[\d-]+\s[\d:.]+)\|ERROR\|.*?(?<path>\\[\\\w\s\[.\]]+).*?\.(?<description>[\w\s'\r?\n.]+$)"
& {
    $content = Get-Content $File -Raw
    foreach($match in $re.Matches($content)) {
        $date, $path, $description = $match.Groups['date','path','description']
        [pscustomobject]@{
            Date = $date.Value -as [datetime]
            Path = $path.Value.Trim()
            Description = ($description.Value -replace '\r?\n', ' ').Trim()
        }
    }
} | Export-Csv "output.csv" -NoTypeInformation

The output I got using the sample data provided in the question looks like this, which can be exported as a proper CSV:

PS /> $output | Format-Table

Date                  Path                                                  Description
----                  ----                                                  -----------
4/17/2022 12:00:00 AM \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] Failed to resolve required input 'input A name'. F… 
4/17/2022 12:00:00 AM \\DATABASE\Path1\Path2\Path3\Path4\Item[1. Item Name] Failed to resolve required input 'input name'. The…

PS /> $output[0].Description

Failed to resolve required input 'input A name' Failed to resolve required input 'input B name' No output is defined.

You can remove -as [datetime] if you want to keep the date format as you currently have it in your file.

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • Thank you so much. I'm just getting started with regex, I'm still struggling with some things. The script seems to catch most of the errors, but some don't seem to be built the same way, but this beginning helps me a lot. Also some database and path are not recovered because are written like this "\\DA-TA-BASE\". But I want to try to figure out how to fix it. – Heighties Apr 18 '22 at 12:16
  • that's what I tried on regex101 but it returns me this message : "You cannot create a range with shorthand escape sequences." I keep doing tests – Heighties Apr 18 '22 at 12:36
  • @Heighties I'll check on it when back home – Santiago Squarzon Apr 18 '22 at 12:58
  • It's better with "(?\\[\\\w-]+)", it then blocks because of a space in the path name. But I'm getting closer to the expected result – Heighties Apr 18 '22 at 13:02
  • Do the real paths really have `[ something ]` at the end? – Santiago Squarzon Apr 18 '22 at 13:09
  • Yes, that is the real name of the analysis in error. Before that, this is the way to get there. – Heighties Apr 18 '22 at 13:16
  • @Heighties ok, well if you haven't solved it by th time I'm back home I'll look into it – Santiago Squarzon Apr 18 '22 at 13:35
  • Looks better with "(?m)(?^[\d-]+\s[\d:.]+)\|ERROR\|.*?(?\\.*\]\s).*?\.(?[\w\s'\r?\n.]+$)" Now I don't understand why not all the errors are retrieve. I'm working on it. But thanks a lot for your time and help. – Heighties Apr 18 '22 at 15:15
  • @Heighties can we assume that the paths will always be UNC (meaning, they will always start with ```\\```) ? and that, after each path there is something between parentheses like `(ID:###.....)` ? – Santiago Squarzon Apr 18 '22 at 21:54
  • Yes, I want to retrieve just all the errors with this path. I tried this regex : (?m)(?^[\d-]+\s[\d:.]+)\|ERROR\|.*?(?\\.*\]\s).*?\.(?.*)$" But it doesn't take some line breaks into account (retrieve 95% of errors I want). I am however close to the expected result – Heighties Apr 18 '22 at 22:06
  • @Heighties why not share a link of regex101 with examples of where the regex is not working (you dont need to register, just "Save Regex" and copy the link) – Santiago Squarzon Apr 18 '22 at 22:08