1

In a particular example, I want to extract function calls in a string input (ruby script), to do some statistics (in java). For an example input of:

Math.sqrt(2-Math.hypot((3),4))-factorial(5)

I want to get a list of possible functions used (verification does not need to be 100% accurate, and it can include some extra faulty guesses) :

{ Math.sqrt, Math.hypot, factorial }

List does not have to be case sensitive, but it should include function class path if it exists.

I tried naively simplistic ".*\\((.*)\\)", but I could not get it to work. It seems, that I need to use lookaheads or backreference, but I'm a bit stumped. My question is, can I even do this?

Leonid Shevtsov
  • 14,024
  • 9
  • 51
  • 82
Margus
  • 19,694
  • 14
  • 55
  • 103

4 Answers4

3

No. You cannot should not (see edit below) do this.

Regular expressions can only match regular languages, but bracket () matching is required to match function calls since there can be nested expressions like ((1+ 2)*3) in a function call. Regular expressions cannot deal with nested parentheses.

To learn more about regular languages and the limits of regular expressions, see Regular Expressions (wikipedia)

To solve your particular problem, you might be interested in the following resources, which recommend importing the ruby script and using reflection (wikipedia)


Edit: If all you want is the function name it is possible that you might get regex to work. However:

  • there are other problem cases. For example what would you do if a member function is called? a constructor?
  • if you ever want to scrape more information (such as arguments passed in) you will have to discard your project and start over.
Community
  • 1
  • 1
Cam
  • 14,930
  • 16
  • 77
  • 128
2

For your example, the pattern:

(?:\\w+\\.)?\\w+(?=\\()

gives the result you want, but it won't be perfect I'm sure. If a quick but rough result is what you're after, that might be it!

Highly Irregular
  • 38,000
  • 12
  • 52
  • 70
0

You can attempt to but you will run into the many issues of attempting to parse a complicated grammar with a tool that was not meant to do so. the number of cases you will need to cover verges on the infinite since state and previous tokens are always import in a programming language.

rerun
  • 25,014
  • 6
  • 48
  • 78
0

Yes, the legit solution to this problem would be hard unless you already had experience with grammars and stuff. However, a quick and dirty (and perhaps imperfect) solution might be feasible.

Here are my thoughts... I don't know Ruby so i'm not sure if i'm missing something. There's no need to match parenthesis... the opening parenthesis "(" is the only one that really matters, assuming that the program has no syntax errors. You could search for the following string

"[A-Za-z_][.A-Za-z_0-9+]*("

In most languages, functions start with a letter or underscore and are followed by zero or more non-space non-special symbol characters. That's what this expression captures. It would work on your example. Of course, it would return duplicates (which can be uniqued) and would find stuff inside comments, but as a quick and dirty solution it should be alright.

aleph_null
  • 5,766
  • 2
  • 24
  • 39