0

I have some ANTLR/C# classes capable of parsing a T-SQL stored procedure, walking its dependencies and identifying the required permissions to execute said stored procedure.

I now need to analyse a couple of relatively large C# .NET projects to identify what stored procedures are called.

It is not enough to just locate ALL stored procedure calls, I need to work from a given method and identify stored procedure calls from that.

We can assume that stored procedures are the only means of invoking SQL.

I suppose ANTLR would assist again, but I am uncertain about a few things

  1. Is ANTLR the only approach to walking the chain of method calls?
  2. If I use ANTLR where do I find the 'best' C# .g4 grammar files?
  3. How would I walk from a method call in one unit to its declaration in another - this is the point on which I am most unsure at the moment

To simplify the problem let us suppose that all stored procedure calls take the form

            new SqlCommand()
            {
                Connection = GetConnection(),
                CommandText = "dbo.SomeProcName",
                CommandType = CommandType.StoredProcedure
            };

for my purposes that is almost true.

My Principal concern is 3. above

Hugh Jones
  • 2,706
  • 19
  • 30
  • This can be quite tricky because a stored procedure can be called in many ways and you would need all of the source code (including used libraries and generated code) If there are good unit tests, it might be easier to use them to find the calls to stored procedures or you could trace the system in production and log the calls to the procedures – Emond May 13 '20 at 09:44
  • Define "stored procedure call". `var s = new SqlCommand(); s.CommandText = t; s.CommandType = CommandType.StoredProcedure; s.ExecuteScalar()` is a stored procedure call, but figuring out which procedure can be non-trivial depending on where `t` comes from. `s.CommandText = "EXEC myProc"` is also a stored procedure call, but without using `CommandType.StoredProcedure`. In general this problem is unsolvable (the code could be arbitrarily complex); in practice you can of course limit detection to well-known patterns. Things get even harder if the code uses ORMs like Dapper or EF, though. – Jeroen Mostert May 13 '20 at 10:38
  • @Erno - so let's assume all Stored Procedure calls are all SqlCommand. How might I walk from one unit to another having parsed a method call? Tracing is not an option btw. – Hugh Jones May 13 '20 at 10:41
  • @JeroenMostert - for my purposes I can assume (at least to start) that all stored Procedure calls are in the form *var s = new SqlCommand(); s.CommandText = "StoredProcName"; s.CommandType = CommandType.StoredProcedure;* and No ORM to consider. – Hugh Jones May 13 '20 at 10:46
  • If all calls use SqlCommand, use GREP to find all SQLCommand. As I said before and Jeroen mentioned as well, this is non-trivial. – Emond May 13 '20 at 10:47
  • @Erno non-trivial indeed. I could indeed grep for all SqlCommands, but they are all contained within a class library inclded in a number of applications; I would still have to discard those not called by "this" method. On the plus side I only have around a half-dozen methods to analyse. – Hugh Jones May 13 '20 at 10:54
  • 1
    I'd use Roslyn to find the method dependencies rather than ANTLR, as Roslyn is always more up to date than any third-party ANTLR definitions, and also has an API specific to the language (`SymbolFinder.FindReference`). Roslyn has a bit of a learning curve itself, but there's lots of information online, as well as a thriving community. (If you're literally only scanning for `CommandType.StoredProcedure` and `.CommandText = ""`, though, regular expressions are definitely worth considering.) – Jeroen Mostert May 13 '20 at 10:54
  • If it's half a dozen methods as you stated, you could have done by hand in the time it takes to get this question answered – Emond May 13 '20 at 11:19
  • @Erno - no - i) that is a half dozen top-level methods which make any number of subsequent method calls and ii) I do not want to have to repeat the exercise every time the code changes – Hugh Jones May 13 '20 at 11:34
  • @JeroenMostert - I shall definitely investigate Roslyn. The problem really is all about walking the tree of method calls. ANTLR is only really in the frame because it worked so well with the TSql, but, Microsoft appears to be adopting ANTLR so a definitive grammar may not be that far away – Hugh Jones May 13 '20 at 11:40
  • Re: T-SQL: there is actually a full-fledged [T-SQL parser](https://docs.microsoft.com/dotnet/api/microsoft.sqlserver.transactsql.scriptdom.tsqlparser) from MS, and for semantic analysis there's [`TSqlModel`](https://docs.microsoft.com/dotnet/api/microsoft.sqlserver.dac.model.tsqlmodel). That said, the API of the latter is pretty heinous and the documentation worse and you need to crib a lot from samples to get anywhere. Even so, the upside is that these projects are always up to date with the latest features in SQL Server (being maintained by MS), while separate ANTLR grammars may not be. – Jeroen Mostert May 13 '20 at 11:59
  • @JeroenMostert - thanks for that. For better or for worse I was able to parse TSql quite successfully using a published grammar. It is not 100% (I still have to think about dynamic sql in SPs for example) but it works well enough. I have 3 sets of permissions to derive. The first is done, for which there was a nice reliable source to identitfy which SPs get called. The second 2 may take a while longer - 1 is a website, the other is a Windows Service. The Service is the more important but probably the hardest, not sure. – Hugh Jones May 13 '20 at 12:18

0 Answers0