2

I need to do some very light parsing of C# (actually transpiled Razor code) to replace a list of function calls with textual replacements.

If given a set containing {"Foo.myFunc" : "\"def\"" } it should replace this code:

var res = "abc" + Foo.myFunc(foo, Bar.otherFunc( Baz.funk()));

with this:

var res = "abc" + "def"

I don't care about the nested expressions.

This seems fairly trivial and I think I should be able to avoid building an entire C# parser using something like this for every member of the mapping set:

  • find expression start (e.g. Foo.myFunc)
  • Push()/Pop() parentheses on a Stack until Count == 0.
  • Mark this as expression stop
  • replace everything from expression start until expression stop

But maybe I don't need to ... Is there a (possibly built-in) .NET library that can do this for me? Counting is not possible in the family of languages that RE is in, but maybe the extended regex syntax in C# can handle this somehow using back references?

edit: As the comments to this answer demonstrates simply counting brackets will not be sufficient generally, as something like trollMe("(") will throw off those algorithms. Only true parsing would then suffice, I guess (?).

Community
  • 1
  • 1
oligofren
  • 20,744
  • 16
  • 93
  • 180
  • If this is a one-off couldn't you just start with a regex for N matched parentheses, then N-1, ... until you've got them all? – Ian Mercer May 19 '16 at 04:16
  • 1
    This sounds like a job for the [Roslyn](https://github.com/dotnet/roslyn) library. – Raymond Chen May 19 '16 at 04:18
  • @IanMercer Possibly? Not sure how to do that... According to http://www.regular-expressions.info/recurse.html#balanced .NET does not support recursive regexes, but it does support "balancing constructs" which should be a good fit. I just don't know how to use them. – oligofren May 19 '16 at 04:31
  • @RaymondChen It absolutely would be a better solution, but it also seems a bit complex to dig into the roslyn docs to find out how to work with the AST. Would you happen to know how to work with it? There are no answers here yet ... – oligofren May 19 '16 at 04:40
  • @IanMercer As this is supposed to handle all kinds of code thrown at it (https://github.com/fatso83/razor-cli/issues/5), it should not barf at `print("(");`, which a regex would ... But maybe it could be a start? – oligofren May 19 '16 at 04:47
  • [This article](https://msdn.microsoft.com/en-us/magazine/dn904670.aspx) walks through a program that rewrites regular expressions, but the same principle can be used to rewrite anything else. You would look for a MemberAccessExpressionSyntax and see if it's one you want to replace. That said, it is likely overkill if this is just a one-off thing. (On the other hand, writing a C# parser is not fun.) – Raymond Chen May 19 '16 at 05:01
  • @RaymondChen Ah, that seems like a nice intro. I was browsing the FAQ and the unit test code for Roslyn and came across some usable example code, so maybe I can pair the two and make something usable. Example source code for the Roslyn FAQ: https://github.com/dotnet/roslyn/blob/master/src/Samples/CSharp/APISampleUnitTests/FAQ.cs#L366 – oligofren May 19 '16 at 05:09
  • If the code is that complex that it can contain parentheses inside quotes then Roslyn is sounding like the best bet here, especially if this isn't a one off where you can "run 10 regex replaces in sequence with N, N-1, ... parentheses to get most of them fixed, clean up a couple" and be done. – Ian Mercer May 19 '16 at 06:48
  • 2
    Depending on how antagonistic the code base is, you might might or might not need to worry about `Namespace.Foo.myFunc(...)`, (which you want to replace) vs. `UnrelatedNamespace.Foo.myFunc(...)`. (which you don't). Or worse, `using UnrelatedNamespace; Foo.myFunc(...)`. Or `Foo` + newline + `/* now call myFunc */ .myFunc(...)` - a method call split across multiple lines. Or aliases: `using Foo = UnrelatedNamespace.Bar; Foo.myFunc(...)`. – Raymond Chen May 19 '16 at 13:30

1 Answers1

2

The trick for a normal string will be:

(?>"(\\"|[^"])*")

A verbatim string:

(?>@"(""|[^"])*")

Maybe this can help, but I'm not sure that this will work in all cases:

<func>(?=\()((?>/\*.*?\*/)|(?>@"(""|[^"])*")|(?>"(\\"|[^"])*")|\r?\n|[^()"]|(?<open>\()|(?<-open>\)))+?(?(open)(?!))

Replace <func> with your function name.

Useless to say that trollMe("\"(", "((", @"abc""de((f") works as expected.

DEMO

Sebastian Schumann
  • 3,204
  • 19
  • 37