0

I have the following SQL text in a file. the objective is to identify database tables names from the files. Below is just a generic example and I am looking for a generic solution, either in C# or Perl. I am not done lot of regex therefore I would appreciate if someone gives me a start

select
a.xyz,
b.xyz,c.xyz,
d.xyz
from db1.test1 a
inner join db2.test2 b
on a.xyz = b.xyz
inner join
(
select a.xyz
from db1.test3) as c
on a.xyz=c.xyz
left outer join db1.test4 d
on c.xyz = d.xyz

so basically, i need to automate finding out names of all tables in the SQL. in this case, test1, test2, test3 and test4

I know that pattern is the tables names are preceded by either "from" , "inner join", "left outer join" , then a databasename (such as db1, db2 etc) , then a literal '.' and the table name.

bcd
  • 155
  • 4
  • 9

1 Answers1

2

This is C# code that will find your table names in the example:

var matches = Regex.Matches(yourString, @"(from|inner\s+join|left\s+outer\s+join)\s+[a-zA-Z0-9]+\s*\.\s*(?<table>[a-zA-Z0-9]+)(\s+[a-zA-Z0-9]+)?(\s*,\s*[a-zA-Z0-9]+\s*\.\s*(?<table>[a-zA-Z0-9]+)(\s+[a-zA-Z0-9]+)?)*", RegexOptions.ExplicitCapture);
foreach (Match match in matches)
{
    foreach (Capture capture in match.Groups["table"].Captures)
    {
        string tableName = capture.Value;
    }
}

The same regex will at least get you pointed in the right direction in Perl, as regex is mostly cross-compatible.

Edit: Updated to (clumsily?) find multiple tables separated by commas, and incorporate Alan's suggestions to use "table" to find the groups and mark ExplicitCapture. If you run into many more problems, you might want to do as suggested in Regular expression to find all table names in a query and find a SQL parser instead of trying to use regex.

Community
  • 1
  • 1
Tim S.
  • 55,448
  • 7
  • 96
  • 122
  • thanks Tim, there is one other pattern that I may encounter, for example, I could have a pattern like this "from db1.test5, db1.test6" . In this case, i want to capture but test5 and test6. However I see that I can capture on test5 (based on above code). is there any way to match both test5 and test6 – bcd Feb 13 '13 at 00:59
  • +1, but you should use `Groups["table"]`, not `Groups[2]`. To see why, add this to the end of your regex: `(?:\s+(\w+))?`. You should find that `Group[2]` now contains the table alias (`a`, `b`, or `d`), and `Groups["table"]` is the same as `Groups[3]`. It's never a good idea to use named groups and numbered groups in the same regex. In fact, if you use named groups you should set the ExplicitCapture flag to disable numbered groups. – Alan Moore Feb 13 '13 at 01:08