I extract exceptions from a log, here is an example of one:
Exception: System.InvalidOperationException: Collection was modified; enumeration operation may not execute. at System.Collections.Generic.List`1.Enumerator.MoveNextRare() at test.Modules.UI.Table.<>c_DisplayClass2.b_0() at System.Win
Sometimes the logs are in different language so it will look like this:
Exception: System.InvalidOperationException: La colección fue modificada, la operación de enumeración no puede ejecutar. at System.Collections.Generic.List`1.Enumerator.MoveNextRare() at Test.Modules.UI.Table.<>c_DisplayClass2.b_0() at System.Win
As you can see only the exception part differs as its in a different language but the part after will be identical. I have all these exceptions stored in a database, all trimmed to 300 characters in length as there often much longer but 300 characters is sufficent to tell if there the same or not
So i was thinking maybe skip the exception and compare the next 300 characters after but its going to be extremely difficult to know where the exception ends, there isnt anything specific that displays the start and end of the exception.
Any ideas how i could overcome this? Maybe i just use Levenshtein to highlight where there is a close match, then i can filter those and maybe setup an interface that allows me to link exceptions once i manually identify there the same exception just there written in a different language?
My end goal is to reviewing thousands of these logs and see how many exceptions were found that are the same, most of the logs are english but maybe 25% are non english so wheras normally i could just run a query for an exact match on the exception because the languages is different for the exception part its probably only going to be a 60-70% match. There might be rare cases where the part after the exception is close in match to a different exception but that would be rare so not too much of a concern
I need to do this in PHP