4

I have an exception handler method in my VB.NET application that retrieves details from the last exception to occur and e-mails that information to our Help Desk (me) for diagnostics and troubleshooting. One instance of a possible exception is when the application attempts to connect or execute some SQL on one of our database servers. In those cases - especially when the exception is because of a failure to connect - I want to see the database connection string that's being used to make certain it's properly formatted.

However, database connection strings usually contain the user's ID and password, and I want to mask the password within the string. The challenge, however, is that we connect to multiple database types (e.g., PostgreSQL, MySQL, SQLite, MS Access, etc.) and, depending on the database, there may or may not be a password in the connection string. Also, connection string formats vary from one provider to the next.

I'm extremely unfamiliar with RegEx but, using https://regexr.com/, I've managed to come up with the following RegEx pattern that seems to work sometimes:

(?<=;?[Pp][Aa][Ss][Ss].*=)(.*?)(?=;)|$

An example connection string would look like this:

Host=SERVERNAME;Port=####;Database=DBNAME;Username=USERID;Password=MyPa$$Word;Integrated Security=False

The pattern above correctly matches MyPa$$Word in the string if it's explicitly in this order, but if I move the Password key/value pair closer to the beginning of the connection string like this:

Host=SERVERNAME;Port=9999;Password=MyPa$$Word;Database=DBNAME;Username=USERID;Integrated Security=False

it then matches MyPa$$Word, DBNAME, and USERID. If I move it to the end of the string:

Host=SERVERNAME;Port=9999;Database=DBNAME;Username=USERID;Integrated Security=False;Password=MyPa$$Word

the pattern doesn't find any matches. Just to make sure the key/value with the space (Integrated Security=False) wasn't confusing the pattern, I removed that from the string and got the same results.

Because the connection string may be structured in a variety of ways depending on the database type, the user input, etc., I'd like to be able to use RegEx to find the (case-insensitive) password key/value pair anywhere in the connection string, extract only the actual password value, and replace it with something (like [HIDDEN]). I know that I could probably just do a String.Split(Convert.ToChar(";")) on the whole connection string and check each key/value pair, but I'd prefer to do it with RegEx, if possible.

G_Hosa_Phat
  • 976
  • 2
  • 18
  • 38

2 Answers2

3

Something like this could work:

((^|;)Password=)(.*?)(;|$)

The password will be in $3 so just make sure to omit that capture group when performing the replacement.

Click on "Code Generator" on the left side of this page to see it in action in PHP.

https://regex101.com/r/gxztmy/1

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • Thanks for this! It seems to work well, but the suggestion from Wiktor seems to be a bit more effective in my (limited) testing so far. – G_Hosa_Phat Sep 19 '19 at 17:36
  • 2
    @G_Hosa_Phat I see. I just took the examples you provided and made it work. You should consider providing a concise set of data which you expect to encounter instead of spreading the details across such a lengthy post. If `word` from `Password` is optional then you can make it simply `Pass.*?=` and it will lazily match everything leading up to the equal sign. – MonkeyZeus Sep 19 '19 at 17:40
  • Thanks for your feedback. I realize I may have been a bit verbose, but I wanted to provide as much detail as possible to provide a clear picture of the situation including what I was trying to accomplish as well as what I had tried. I apologize if I was unclear, but the fact that I'm dealing with a variety of possibilities makes providing a precise set of data a bit difficult to come by. As for the variation on the key name, I had hoped the RegEx pattern I provided would make that condition clear to RegEx "veterans". I apologize for not explicitly stating that within the question. – G_Hosa_Phat Sep 19 '19 at 17:47
  • Well, to be fair, there was a considerable amount of copy+paste from various sources involved. I just fiddled around with it as much as I could with my limited understanding to make it "work". ;) – G_Hosa_Phat Sep 19 '19 at 18:25
2

In VB.NET, you may consider

text = Regex.Replace(text, "(?<=(?<![^;])pass\w*=).*?(?=;[\w\s]+=|$)", "[HIDDEN]", RegexOptions.IgnoreCase)

Or

text = Regex.Replace(text, "(?<![^;])(pass\w*=).*?(?=;[\w\s]+=|$)", "$1[HIDDEN]", RegexOptions.IgnoreCase)

C# versions in case one needs this:

text = Regex.Replace(text, @"(?<=(?<![^;])pass\w*=).*?(?=;[\w\s]+=|$)", "[HIDDEN]", RegexOptions.IgnoreCase);
text = Regex.Replace(text, @"(?<![^;])(pass\w*=).*?(?=;[\w\s]+=|$)", "$1[HIDDEN]", RegexOptions.IgnoreCase);

See a lookbehind regex demo and the capturing group solution demo.

Details

  • (?i) - case insensitive modifier (or RegexOptions.IgnoreCase)
  • (?<=(?<![^;])pass\w*=) - a positive lookbehind that requires a string pass (preceded with , or start of string position) + any 0 or more word chars followed with = immediately to the left of the current location
  • .*? - any 0+ chars other than an LF symbol as few as possible
  • (?=;[\w\s]+=|$) - a position in string immediately followed with ;, 1+ word or whitespace chars and = or end of string.

In the capturing group solution, the captured substring is put back into the result using $1 placeholder.

Why (?<![^;]) and not (?<=^|;)? Because an alternation in an unanchored lookbehind is costly in terms of performance. It makes sense to minimize the overhead if an equivalent pattern without alternation exists. (?<![^;]) matches any location that is either at the start of string or preceded with ;, so it should be preferred. This kind of pattern is not possible if the left-hand side context is a multicharacter string, or if the multiline mode is required, though.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you so much! After a little bit of quick testing, this appears to do everything I want it to, including matching the possibility of the password "key" being something like `pass` (I tested it, just in case). I think the first option (not using the capturing group) makes it as simple as possible. – G_Hosa_Phat Sep 19 '19 at 17:32
  • Implemented VB.NET code: `Dim PasswordMasked As String = Text.RegularExpressions.Regex.Replace(ConnectionString, "(?i)(?<=(?<![^;])pass\w*=)[^;]+", "[HIDDEN]")` – G_Hosa_Phat Sep 19 '19 at 18:29
  • @Çöđěxěŕ - Well, crap. That *does* "fail" in the sense that it stops the match at the semi-colon in the middle of the actual password value. However, due to the fact that the key/value pair *could* appear at any position in the string and the wide variety of possible key names, I don't know how difficult it would be to "trap" that condition. I mean, what if the password has a `;` *and* a `=` (especially if the `=` follows the `;`)? I think this is going to be a "best-effort" type of scenario no matter what method is used. – G_Hosa_Phat Sep 19 '19 at 18:47
  • 1
    @G_Hosa_Phat You may match `;` that is part of a password *if* the next key is always of `[^;=]+=` pattern after `;` (i.e. the key can only contain chars other than `=` and `;`). – Wiktor Stribiżew Sep 19 '19 at 21:07
  • 1
    @G_Hosa_Phat Then you need `(?i)(?<=(?<![^;])pass\w*=).*?(?=;[^;=]+=|$)`. [DEMO](http://regexstorm.net/tester?p=%28%3fi%29%28%3f%3c%3d%28%3f%3c!%5b%5e%3b%5d%29pass%5cw*%3d%29.*%3f%28%3f%3d%3b%5b%5e%3b%3d%5d%2b%3d%7c%24%29&i=Username%3dU%3bPassword%3dMyPa%24%24W%3bord%3bIn+Se%3dFalse%0d%0aPort%3d9%3bPassword%3dMyPa%24%3b%24Word%3bDatabase%3dDBNAME%3bInt+S%3dFalse%0d%0aUn%3dD%3bId+S%3dFalse%3bPassword%3dMyPa%24%24%3bWord&r=%5bHIDDEN%5d&o=m) – Wiktor Stribiżew Sep 19 '19 at 21:10
  • OMG! Thanks! The only condition I can think of at this point that it doesn't account for is a *VERY* edge case scenario where the password itself contains a `;` followed by a `=`. Still, I believe should be 99.999% effective or better. I've been playing with it and had come up with a not-quite-satisfactory solution, but I don't understand RegEx well enough to understand why it works when I think it should fail and fails when I think it should work. – G_Hosa_Phat Sep 19 '19 at 21:48
  • 1
    @G_Hosa_Phat I doubt that is an issue, see [demo](http://regexstorm.net/tester?p=%28%3fi%29%28%3f%3c%3d%28%3f%3c!%5b%5e%3b%5d%29pass%5cw*%3d%29.*%3f%28%3f%3d%3b%5b%5e%3b%3d%5d%2b%3d%7c%24%29&i=Username%3dU%3bPassword%3dMyPa%24%24W%3b%3dord%3d%3bIn+Se%3dFalse%0d%0aPort%3d9%3bPassword%3dMyPa%24%3b%24Word%3bDatabase%3dDBNAME%3bInt+S%3dFalse%0d%0aUn%3dD%3bId+S%3dFalse%3bPassword%3dMyPa%24%24%3bWord&r=%5bHIDDEN%5d&o=m). The problem may only arise if the key can contain `=` or `;`. – Wiktor Stribiżew Sep 19 '19 at 21:55
  • Thanks so much for all your help. Personally, I'm pretty satisfied with the results as they stand right now. I mean, worst-case scenario, it still catches and masks at least *part* of the password. However, just FYI, in my testing, while everything else seems to pick up as expected, I *have* run into this "edge case" with a password of `MyPa;$sW0=?d` (ex: `Password=MyPa;$sW0?d;Database=DBNAME`). It matches `MyPa`, so I'd get `Password=[HIDDEN];$sW0?d;Database=DBNAME` after the `Regex.Replace` method was called. – G_Hosa_Phat Sep 19 '19 at 22:05
  • 1
    @G_Hosa_Phat It is a matter of finding out the exact pattern for the key. If it can contain only word and whitespace chars, use `(?i)(?<=(?<![^;])pass\w*=).*?(?=;[\w\s]+=|$)` – Wiktor Stribiżew Sep 19 '19 at 22:35
  • 1
    @G_Hosa_Phat I have edited the answer to do exactly what you need, with the VB.NET and C# code. I guess the two downvotes came due to the fact those people think lookbehind is too inefficient (though from experience I know there are situations when we can't use capturing group solutions), or they mistakenly thought I missed the C# `@` to define verbatim string literals - in VB.NET, string literals are already "verbatim" and do not support string escape sequences. Also, I amended the the capturing group solution regex and showed how it can be used in code. – Wiktor Stribiżew Sep 20 '19 at 07:10
  • @WiktorStribiżew - Yes, thank you again! I really appreciate all your help, as well as the description/explanation of *why* it works (especially that `$1` token thing). While I "get" the idea of RegEx, I still don't always fully understand the logic and syntax it uses. I'm going to go through a few examples on connectionstrings.com to see if there are any other "exceptions", but I have a feeling this will be more than adequate. – G_Hosa_Phat Sep 20 '19 at 13:41
  • 1
    Looking at the answer, I can't see any reason for the downvote, but I frequently question downvotes on SO... – G_Hosa_Phat Sep 20 '19 at 13:49
  • Just FYI (because it's specific to the use-case scenario), I've gone through a variety of database connection strings on https://connectionstrings.com and, using the info and explanations from your answer, I've made a minor "tweak" to include a couple of possible variants for the password key that I know I might encounter: `(?i)(?<=(?<![^;])(pass\w*=)|(pwd=)|(oledb:\w*\s*pass\w*=)).*?(?=;[\w\s]+=|$)` I've only been able to make it "break" once with a password of `pas;swor=d`, but the fact that this even picks up blank/empty passwords more than makes up for that, IMO. I REALLY appreciate it! – G_Hosa_Phat Sep 20 '19 at 14:55
  • 1
    @G_Hosa_Phat I suggest replacing `(pass\w*=)|(pwd=)|(oledb:\w*\s*pass\w*=)` with `(?:pwd=|(?:oledb:\w*\s*)?pass\w*=)` to shorten it. – Wiktor Stribiżew Sep 20 '19 at 14:57
  • Interesting. While that looks (to my untrained eyes) like it *should* basically work the same as what I wrote, apparently it doesn't. The test string that correctly finds the match with the longer version but "fails" with the shortened version is `​Provider=Microsoft.ACE.OLEDB.12.0;Data Source=C:\myFolder\myAccessFile.accdb;Jet OLEDB:Database Password=MyDbPassword;` **NEVERMIND** I know why. Because of the "jet" in the string. – G_Hosa_Phat Sep 20 '19 at 15:16
  • 1
    @G_Hosa_Phat Ah, you only requre `;` or start of string before `pass\w*=` in your regex. I made the lookbhehind apply to all alternatives. Ok, use any of the modification that works best for you. – Wiktor Stribiżew Sep 20 '19 at 15:23