We have an SQL server where we log errors. I want to extract a sort of "error report" from it to see how many of each error there are for a given originator.
I am able to accomplish this by selecting from the appropriate table and grouping by originator and error and adding "Count(*)". I am using LINQ EF Core but can translate into an SQL query, if needed, to solve my below problem.
The problem is that there are errors like "Document 2361343 failed.", "Document 47464425 failed", etc.
How can I group all similar errors into one, like for example, "Document * failed"?
I'm thinking I could try with a Regex but I don't know how to incorporate it into the LINQ and/or SQL statement.
Ideally a general solution, perhaps an algorithm that would be able to detect similar messages and group them into one, would be ideal, but I'd settle just for solving the particular "Document * failed" problem, too.
IDEALLY I'd love for someone to tell me of an algorithm that can recognize similar text and group them. Basically this regex Document .* failed
but smart enough to be able to work with any text with no hardcoding. So it would be able to recognize lala 5 yoyo
and lala 4 yoyo
without me having to tell it lala .* yoyo
.
EDIT:
Example data in the table (skipping irrelevant columns):
Originator Error
-------------------------------------------------
Voltuas Record unavailable.
BonBon Order 47579 already shipped.
Voltuas Record unavailable
BizWalk Document 3527267 failed.
BizWalk Record unavailable.
Voltuas Order 7487 already shipped.
Gonda Document 78959 failed.
BizWalk Document 56739028 failed.
Desired output:
Voltuas - Record unavailable - 2
Voltuas - Order * already shipped. - 1
BonBon - Order * already shipped. - 1
BizWalk - Document * failed - 2
BizWalk - Record unavailable - 1
Gonda - Document * failed - 1
What I have so far (using EF Core latest):
context.Table_name.GroupBy(g => new { g.Originator, g.Error })
.Select(x => new { Originator = x.Key.Originator, Error = x.Key.Error, Count = x.Count()}).ToList();