2

We have an SQL server where we log errors. I want to extract a sort of "error report" from it to see how many of each error there are for a given originator.

I am able to accomplish this by selecting from the appropriate table and grouping by originator and error and adding "Count(*)". I am using LINQ EF Core but can translate into an SQL query, if needed, to solve my below problem.
The problem is that there are errors like "Document 2361343 failed.", "Document 47464425 failed", etc.

How can I group all similar errors into one, like for example, "Document * failed"?

I'm thinking I could try with a Regex but I don't know how to incorporate it into the LINQ and/or SQL statement.

Ideally a general solution, perhaps an algorithm that would be able to detect similar messages and group them into one, would be ideal, but I'd settle just for solving the particular "Document * failed" problem, too.

IDEALLY I'd love for someone to tell me of an algorithm that can recognize similar text and group them. Basically this regex Document .* failed but smart enough to be able to work with any text with no hardcoding. So it would be able to recognize lala 5 yoyo and lala 4 yoyo without me having to tell it lala .* yoyo.

EDIT:

Example data in the table (skipping irrelevant columns):

Originator                      Error
-------------------------------------------------
Voltuas                         Record unavailable.
BonBon                          Order 47579 already shipped.
Voltuas                         Record unavailable
BizWalk                         Document 3527267 failed.
BizWalk                         Record unavailable.
Voltuas                         Order 7487 already shipped.
Gonda                           Document 78959 failed.
BizWalk                         Document 56739028 failed.

Desired output:

Voltuas - Record unavailable - 2
Voltuas - Order * already shipped. - 1
BonBon  - Order * already shipped. - 1
BizWalk - Document * failed - 2
BizWalk - Record unavailable - 1
Gonda   - Document * failed - 1

What I have so far (using EF Core latest):

context.Table_name.GroupBy(g => new { g.Originator, g.Error })
       .Select(x => new { Originator = x.Key.Originator, Error = x.Key.Error, Count = x.Count()}).ToList();
Tessaract
  • 1,047
  • 7
  • 24
  • 2
    Please show us sample data, desired results and your best attempt. And really you need to be more specific - there are multiple ways to solve this problem, which leaves it open to being closed as too board. You can solve it any of the ways you are considering... you get to choose. – Dale K Apr 19 '21 at 10:21
  • 1
    Possibly you could create a view and replace the description with a generic one using `case` and some `like` matches and aggregate from that. – Stu Apr 19 '21 at 10:32
  • WHERE [Error Description] [LIKE](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver15) 'Document % failed' – Nicholas Hunter Apr 19 '21 at 10:40
  • @DaleK, edited and also, I don't know HOW to solve it in any of the ways I mentioned, that's precicely what I am asking. Ideally I'd love for someone to tell me an algorithm which to use. – Tessaract Apr 19 '21 at 11:16
  • @Stu, I'd have to hardcode every such error into that view, which would be suboptimal. – Tessaract Apr 19 '21 at 11:16
  • Based only on the data you posted, I would suggest you replace the text Error column in the table with a numeric Order Status code 0=OK, 1=Already shipped, etc. The error description column then becomes a derived value you could generate at read time and your query becomes a simple grouping on Order Status. – Nicholas Hunter Apr 19 '21 at 11:25
  • @Tessaract if you are unsure where to start then its time to do more research because its not an unusual problem. – Dale K Apr 19 '21 at 20:03

1 Answers1

0

One option could be to GROUP BY the string with all numbers removed. For how to remove numbers from a string in SQL you can check out Remove numbers from string sql server.

This obviously won't work if you have variable strings inside error messages, but maybe you don't.

For entity framework Function Mappings of the Microsoft SQL Server Provider shows that Replace is supported.

The following is linq to objects, but it should work with the database too.

var l = new[] {
    new {Originator = "Voltuas", Error = "Record unavailable."},
    new {Originator = "BonBon", Error = "Order 47579 already shipped."},
    new {Originator = "Voltuas", Error = "Record unavailable"},
    new {Originator = "BizWalk", Error = "Document 3527267 failed."},
    new {Originator = "BizWalk", Error = "Record unavailable."},
    new {Originator = "Voltuas", Error = "Order 7487 already shipped."},
    new {Originator = "Gonda", Error = "Document 78959 failed."},
    new {Originator = "BizWalk", Error = "Document 56739028 failed."},
};


var grouped = l.Select( r => new { r.Originator, Error = r.Error.Replace("0","").Replace("1","").Replace("2","").Replace("3","").Replace("4","").Replace("5","").Replace("6","").Replace("7","").Replace("8","").Replace("9","")})
    .GroupBy(g => new { g.Originator, g.Error })
    .Select(x => new { Originator = x.Key.Originator, Error = x.Key.Error, Count = x.Count()})
    .ToList();

foreach (var g in grouped)
{
    Console.WriteLine($"{g.Originator}\t'{g.Error}' = {g.Count} ");
}

This prints:

Voltuas 'Record unavailable.' = 1 
BonBon  'Order  already shipped.' = 1 
Voltuas 'Record unavailable' = 1 
BizWalk 'Document  failed.' = 2 
BizWalk 'Record unavailable.' = 1 
Voltuas 'Order  already shipped.' = 1 
Gonda   'Document  failed.' = 1 

Please note that

Voltuas 'Record unavailable.' = 1 
Voltuas 'Record unavailable' = 1

are not grouped because one has a period and the other one doesn't. You can replace it too or more likely in your real data all of them will end with a period.

tymtam
  • 31,798
  • 8
  • 86
  • 126