5

Command-line tools like grep, sed, awk, and perl allow one to carry out textual search-and-replace operations.

However, is there any tool that would allow me to carry out semantic search-and-replace operations in a Java codebase, from command-line?

The Eclipse IDE allows me, e.g., to easily rename a variable, a field, a method, or a class. But I would like to be able to do the same from command-line.

The rename operation above is just one example. I would further like to be able to select the replacee text with additional semantic constraints such as:

  • only the scopes of methods M1, M2 of classes C, D, and E;
  • only all variables or fields of class C;
  • all expressions in which a variable of some class occurs;
  • only the scope of the class definition of a variable;
  • only the scopes of all overridden versions of method M of class C;
  • etc.

Having selected the code using such arbitrary semantic constraints, I would like to be able to then carry out arbitrary transformations on it.

So, basically, I would need access to the symbol-table of the code.

Question:

  1. Is there an existing tool for this type of work, or would I have to build one myself?
  2. Even if I have to build one myself, do any tools or libraries exist that would at least provide me the symbol-table of Java code, on top of which I could add my own search-and-replace and other refactoring operations?
Harry
  • 3,684
  • 6
  • 39
  • 48

2 Answers2

3

The only tool that I know can do this easily is the long awaited Refaster. However it is still impossible to use it outside of Google. See [the research paper](http:// research.google.com/pubs/pub41876.html) and status on using Refaster outside of Google.

I am the author of AutoRefactor, and I am very interested in implementing this feature as part of this project. Please follow up on the github issue if you would like to help.

JnRouvignac
  • 807
  • 5
  • 19
2

What you want is the ability to find code according to syntax, constrained by various semantic conditions, and then be able to replace the found code with new syntax.

access to the symbol table (symbol type/scope/mentions in scope) is just one kind of semantic constraint. You'll probably want others, such as control flow sequencing (this happens after that) and data flow reaching (data produced here is consumed there). In fact there are an unbounded number of semantic conditions you might consider important, depending on the properties of the language (does this function access data in parallel to that function?) or your application interests (is this matrix an upper triangular matrix?)

In general you can't have a tool that has all possible semantic conditions of interest off the shelf. That means you need to be to express new semantic conditions when you discover the need for them.

The best you might hope for is a tool that

  • knows the language syntax
  • has some standard semantic properties built in (my preference is symbol tables, control and data flow analysis)
  • can express patterns on the source in terms of the source code
  • can constrain the patterns based on such semantic properties
  • can be extended with new semantic analyses to provide additional properties

There is a classic category of tools that do this, call source to source program transformation systems.

My company offers the DMS Software Reengineering Toolkit, which is one of these. DMS has been used to carry out production transformations at scale on a wide variety of languages (including OP's target: Java). DMS's rewrite rules are of the form:

 rule <rule_name>(syntax_parameters): syntax_category =
    <match_pattern> ->  <replacement_pattern>
    if  <semantic_condition>;

You can see a lot more detail of the pattern language and rewrite rules look like: DMS Rewrite Rules.

It is worth noting that the rewrite rules represent operations on trees. This means that while they might look like text string matches, they are not. Consequently a rewrite rule matches in spite of any whitespace issues (and in DMS's case, even in spite of differences in number radix or character string escapes). This makes the DMS pattern matches far more effective than a regex, and a lot easier to write since you don't have worry about these issues.

This Software Recommendations link shows how one can define rules with DMS, and (as per OP's request) "run them from the command line": This isn't as succinct as running SED, but then it is doing much more complex tasks.

DMS has a Java front with symbol tables, control and data flow analysis. If one wants additional semantic analyses, one codes them in DMS's underlying programming language.

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • +1. When I mentioned needing the symbol table, I'd also meant needing at minimum the AST (or maybe the parse tree itself) which a tool like `antlr` (with its companion Java grammar) I believe already provide out-of-box. I'll check out DMS shortly. Meanwhile, greatly appreciate your helpful answer. – Harry Jun 20 '16 at 03:21
  • 1
    ANTLR doesn't provide an AST out of the box; you have to add AST construction to the grammar. I think if you use one of the ANTLR Java grammars you can get one where somebody has done that. I don't know if they've gone as far as Java 1.8 let alone Java 1.9. DMS produces ASTs automatically if it has a grammar. ANTLR provides zero help with symbol tables (and Java 1.8/1.9 name and type resolution is *really* tough) and any semantic propeties beyond that. DMS's Java front end has symbol tables, control flow and data flow. ... – Ira Baxter Jun 20 '16 at 03:28
  • 1
    ... Finally, ANTLR has no ability to do source to source rewriting; you can, of course, write procedural code to hack an an ANTLR tree. Sometimes that is useful (even with DMS) but mostly you want source-to-source transforms so you can write them easily in terms of language syntax. Then you have to regenerate source text from the modified AST; ANTLR provides some extremely modest help but not a full AST -> text regeneration. DMS's *purpose* is to do transformation on code so it has source-to-source rewriting built-in, as well as round-tripping of text -> AST ->(transformed AST) -> text. – Ira Baxter Jun 20 '16 at 03:31
  • 1
    There is a project built with Antlr: it is called JavaParser, but I do not think it provides a symbol table yet. And it does not help in rewriting. – JnRouvignac Jun 20 '16 at 04:54
  • @IraBaxter Very impressive, indeed! Will wait a few days before marking your answer as final. Meanwhile, a BIG thanks and a +1. – Harry Jun 20 '16 at 08:15