22

I've been thinking about what I would miss in porting some Python code to a statically typed language such as F# or Scala; the libraries can be substituted, the conciseness is comparable, but I have lots of python code which is as follows:

@specialclass
class Thing(object):
    @specialFunc
    def method1(arg1, arg2):
        ...
    @specialFunc
    def method2(arg3, arg4, arg5):
        ...

Where the decorators do a huge amount: replacing the methods with callable objects with state, augmenting the class with additional data and properties, etc.. Although Python allows dynamic monkey-patch metaprogramming anywhere, anytime, by anyone, I find that essentially all my metaprogramming is done in a separate "phase" of the program. i.e.:

load/compile .py files
transform using decorators
// maybe transform a few more times using decorators
execute code // no more transformations!

These phases are basically completely distinct; I do not run any application level code in the decorators, nor do I perform any ninja replace-class-with-other-class or replace-function-with-other-function in the main application code. Although the "dynamic"ness of the language says I can do so anywhere I want, I never go around replacing functions or redefining classes in the main application code because it gets crazy very quickly.

I am, essentially, performing a single re-compile on the code before i start running it.

The only similar metapogramming i know of in statically typed languages is reflection: i.e. getting functions/classes from strings, invoking methods using argument arrays, etc. However, this basically converts the statically typed language into a dynamically typed language, losing all type safety (correct me if i'm wrong?). Ideally, I think, I would have something like the following:

load/parse application files 
load/compile transformer
transform application files using transformer
compile
execute code

Essentially, you would be augmenting the compilation process with arbitrary code, compiled using the normal compiler, that will perform transformations on the main application code. The point is that it essentially emulates the "load, transform(s), execute" workflow while strictly maintaining type safety.

If the application code are borked the compiler will complain, if the transformer code is borked the compiler will complain, if the transformer code compiles but doesn't do the right thing, either it will crash or the compilation step after will complain that the final types don't add up. In any case, you will never get the runtime type-errors possible by using reflection to do dynamic dispatch: it would all be statically checked at every step.

So my question is, is this possible? Has it already been done in some language or framework which I do not know about? Is it theoretically impossible? I'm not very familiar with compiler or formal language theory, I know it would make the compilation step turing complete and with no guarantee of termination, but it seems to me that this is what I would need to match the sort of convenient code-transformation i get in a dynamic language while maintaining static type checking.

EDIT: One example use case would be a completely generic caching decorator. In python it would be:

cacheDict = {}
def cache(func):
    @functools.wraps(func)
    def wrapped(*args, **kwargs):
        cachekey = hash((args, kwargs))
        if cachekey not in cacheDict.keys():
            cacheDict[cachekey] = func(*args, **kwargs)
        return cacheDict[cachekey]
    return wrapped


@cache
def expensivepurefunction(arg1, arg2):
    # do stuff
    return result

While higher order functions can do some of this or objects-with-functions-inside can do some of this, AFAIK they cannot be generalized to work with any function taking an arbitrary set of parameters and returning an arbitrary type while maintaining type safety. I could do stuff like:

public Thingy wrap(Object O){ //this probably won't compile, but you get the idea
    return (params Object[] args) => {
        //check cache
        return InvokeWithReflection(O, args)
    }
}

But all the casting completely kills type safety.

EDIT: This is a simple example, where the function signature does not change. Ideally what I am looking for could modify the function signature, changing the input parameters or output type (a.l.a. function composition) while still maintaining type checking.

Li Haoyi
  • 15,330
  • 17
  • 80
  • 137
  • 1
    There is a proposal to add macroses to Scala, and macroses will solve most of your problems. (http://scalamacros.org/) – Rogach Nov 26 '11 at 05:12
  • 1
    It isn't very clear what kind of code transformation you are trying to achieve; can you give a few specific examples? On a different note, you may be interested in [Template Haskell](http://www.haskell.org/haskellwiki/Template_Haskell). – Dan Burton Nov 26 '11 at 06:30
  • @DanBurton: I updated the question with an example. I'll go look at Template Haskell! – Li Haoyi Nov 28 '11 at 03:52
  • again an exemple of multi-stage programming. mixing this + strong type + a good editor that integrates this concept would be great.. – nicolas Aug 28 '12 at 08:42

5 Answers5

11

Very interesting question.

Some points regarding metaprogramming in Scala:

  • In scala 2.10 there will be developments in scala reflection

  • There is work in source to source transformation (macros) which is something you are looking for: scalamacros.org

  • Java has introspection (through the reflection api) but does not allow self modification. However you can use tools to support this (such as javassist). In theory you could use these tools in Scala to achieve more than introspection.

  • From what I could understand of your development process, you separate your domain code from your decorators (or a cross cutting concern if you will) which allow to achieve modularity and code simplicity. This can be a good use for aspect oriented programming, which allows to just that. For Java theres is a library (aspectJ), however I'm dubious it will run with Scala.

JaimeJorge
  • 1,885
  • 16
  • 15
  • 3
    It does indeed sound like the OP is doing AOP, based on the way he describes the use cases of python decorators. There are AOP tools for Java and .NET that I know of, but it wouldn't surprise me if there were more languages with AOP support. .NET AOP tools intercept the compile phase, make the necessary changes to the source, and then let the compile phase run - mimicking the exact flow the OP discusses. – Josh Smeaton Nov 27 '11 at 00:30
6

So my question is, is this possible?

There are many ways to achieve the same effect in statically-typed programming languages.

You have essentially described the process of doing some term rewriting on a program before executing it. This functionality is perhaps best known in the form of the Lisp macro but some statically typed languages also have macro systems, most notably OCaml's camlp4 macro system which can be used to extend the language.

More generally, you are describing one form of language extensibility. There are many alternatives and different languages provide different techniques. See my blog post Extensibility in Functional Programming for more information. Note that many of these languages are research projects so the motivation is to add novel features and not necessarily good features, so they rarely retrofit good features that were invented elsewhere.

The ML (meta language) family of languages including Standard ML, OCaml and F# were specifically designed for metaprogramming. Consequently, they tend to have awesome support for lexing, parsing, rewriting, interpreting and compiling. However, F# is the most far removed member of this family and lacks the mature tools that languages like OCaml benefit from (e.g. camlp4, ocamllex, dypgen, menhir etc.). F# does have a partial implementation of fslex, fsyacc and a Haskell-inspired parser combinator library called FParsec.

You may well find that the problem you are facing (which you have not described) is better solved using more traditional forms of metaprogramming, most notably a DSL or EDSL.

J D
  • 48,105
  • 13
  • 171
  • 274
5

Without knowing why you're doing this, it's difficult to know whether this kind of approach is the right one in Scala or F#. But ignoring that for now, it's certainly possible to achieve in Scala, at least, although not at the language level.

A compiler plugin gives you access to the tree and allows you to perform all kinds of manipulation of that tree, all fully typechecked.

There are some issues with generating synthetic methods in Scala compiler plugins - it's difficult for me to know whether that will be a problem for you.

It is possible to work around this by creating a compiler plugin that generates source code which is then compiled in a separate pass. This is how ScalaMock works, for instance.

Paul Butcher
  • 10,722
  • 3
  • 40
  • 44
2

You might be interested in source-to-source program transformation systems (PTS).

Such tools parse the source code, producing an AST, and then allow one to define arbitrary analyses and/or transformations on the code, finally regenerating source code from the modified AST.

Some tools provide parsing, tree building and AST navigation by a procedural interface, such as ANTLR. Many of the more modern dynamic languages (Python, Scala, etc.) have had some self-hosting parser libraries built, and even Java (compiler plug-ins) and C# (open compiler) are catching on to this idea.

But mostly these tools only provide procedural access to the AST. A system with surface-syntax rewriting allows you to express "if you see this change it to that" using patterns with the syntax of the language(s) being manipulated. These include Stratego/XT and TXL.

It is our experience that manipulating complex languages requires complex compiler support and reasoning; this is the canonical lesson from 70 years of people building compilers. All of the above tools suffer from not having access to symbol tables and various kinds of flow analysis; after all, how one part of the program operates, depends on action taken in remote parts, so information flow is fundamental. [As noted in comments on another answer, you can implement symbol tables/flow analysis with those tools; my point is they give you no special support for doing so, and these are difficult tasks, even worse on modern languages with complex type systems and control flows].

Our DMS Software Reengineering Toolkit is a PTS that provides all of the above facilities (Life After Parsing), at some cost in configuring it to your particular language or DSL, which we try to ameliorate by providing these off-the-shelf for mainstream languages. [DMS provides explicit infrastructure for building/managing symbol tables, control and data flow; this has been used to implement these mechanisms for Java 1.8 and full C++14].

DMS has also been used to define meta-AOP, tools that enable one to build AOP systems for arbitrary languages and apply AOP like operations.

In any case, to the extent that you simply modify the AST, directly or indirectly, you have no guarantee of "type safety". You can only get that by writing transformation rules that don't break it. For that, you'd need a theorem prover to check that each modification (or composition of such) didn't break type safety, and that's pretty much beyond the state of the art. However, you can be careful how you write your rules, and get pretty useful systems.

You can see an example of specification of a DSL and manipulation with surface-syntax source-to-source rewriting rules, that preserves semantics, in this example that defines and manipulates algebra and calculus using DMS. I note this example is simple to make it understandable; in particular, its does not exhibit any of the flow analysis machinery DMS offers.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
0

Ideally what I am looking for could modify the function signature, changing the input parameters or output type (a.l.a. function composition) while still maintaining type checking.

I have same need for making R APIs available in the type safe world. This way we would bring the wealth of scientific code from R into the (type) safe world of Scala.

Rationale

  1. Make possible documenting the business domain aspects of the APIs through Specs2 (see https://etorreborre.github.io/specs2/guide/SPECS2-3.0/org.specs2.guide.UserGuide.html; is generated from Scala code). Think Domain Driven Design applied backwards.

  2. Take a language oriented approach to the challenges faced by SparkR which tries to combine Spark with R.

See https://spark-summit.org/east-2015/functionality-and-performance-improvement-of-sparkr-and-its-application/ for attempts to improve how it is currently done in SparkR. See also https://github.com/onetapbeyond/renjin-spark-executor for a simplistic way to integrate.

In terms of solutioning this we could use Renjin (Java based interpreter) as runtime engine but use StrategoXT Metaborg to parse R and generate strongly typed Scala APIs (like you describe).

StrategoTX (http://www.metaborg.org/en/latest/) is the most powerful DSL development platform I know. Allows combining/embedding languages using a parsing technology that allows composing languages (longer story).

SemanticBeeng
  • 937
  • 1
  • 9
  • 15
  • Ira, I read again and regret twice if I still do. Was rushing to put the application idea on the table. But I do understand semantics vs syntax and do have a good idea of Semantic Designs does to have a deep respect for your work. The Stratego XT is a ful fledged program transformation system so semantics can be checked as well. It would be awesome to compare it with your tools. It would be even more so if all the tools were open source. :-) – SemanticBeeng Jul 07 '16 at 14:16
  • Stratego is open source; it doesn't have DMS's abilities. We believe this is because Stratego was built for (good) academic reasons, but doesn't have the resources (e.g, *funding*) to develop it in ways that make as commercially effective as it could be. DMS is built with funding produced by commercial applications. (See http://www.semanticdesigns.com/SuccessStories/index.html, esp. the part about Dow Chemical) I believe that if DMS were open source, it would be nowhere as effective as it is. YMMV, but I've made my bet. – Ira Baxter Jul 07 '16 at 15:51
  • 1
    Regarding Stratego: "semantics can be checked as well". Yes, it is possible in theory; Stratego is a generalized Post system == Turing machine and can arguably compute anything. That's different than you *want* to compute anything with a Post system/Turing machine. When you show me Stratego being used to compute accurate dataflow for C++14, I'll pay more attention. (DMS does). – Ira Baxter Jul 07 '16 at 15:54
  • Stratego XT is a domain specific language for program transformation. Martin Bravenboer created the Dryad Java compiler with it (hence it can do data flow for Java): http://strategoxt.org/Stratego/TheDryadCompiler, http://releases.strategoxt.org/strategoxt-manual/unstable/manual/chunk-part/java-in-stratego.html – SemanticBeeng Jul 08 '16 at 04:54
  • (continued) More impressively, though, is the ability to embed languages http://www.st.ewi.tudelft.nl/~eelco/papers/BV04.pdf; "MetaBorg in Action: Examples of Domain-Specific Language Embedding and Assimilation Using Stratego/XT": http://link.springer.com/chapter/10.1007%2F11877028_10 – SemanticBeeng Jul 08 '16 at 05:02
  • 1
    Stratego has always been an impressive vehicle to demonstrate what is possible to do with pure rewriting and clean integration. Yes, I'm impressed with ability to compose grammars (inherited from SDF). Yes, I know that you *can* make Stratego compute anything. I've seen the various means to get Stratego to compute program facts via rewriting; it isn't clear that these approaches scale whereas the classic compiler algorithms do. I know of Dryad, but this reference doesn't lead to any useful information about it. Is this actually used for Java 1.8? (I found the tech papers on Dryad) – Ira Baxter Jul 08 '16 at 06:26
  • For Dryad, see http://www.lclnet.nl/publications/compilation-by-normalization.pdf For DMS applied to C++, see Akers, R., Baxter, I., Mehlich, M. , Ellis, B. , Luecke, K., Case Study: Re-engineering C++ Component Models Via Automatic Program Transformation, Information & Software Technology 49(3):275-291 2007. An earlier version of that paper is available at http://www.semanticdesigns.com/Company/Publications/WCRE05.pdf Unfortunately, we don't have more recent pubs on this topic. – Ira Baxter Jul 08 '16 at 07:08
  • Cannot comment about scalability of compilers and their algorithms (although I work with large scale analytics and technologies to scale machine learning). But, in the spirit of the Haoyi's post, I suggest that high performance is not a key requirement but open source, composability/embedding, JVM based, etc are critical. The idea here is to build developer tools and not end user tools. I am open to continue this discussion offline and even build stuff towards Haoyi goal and .. mine (above), of course. :-) – SemanticBeeng Jul 08 '16 at 07:16
  • 1
    DMS does name resolution on 500K SLOC Java source code systems in 60 seconds. I'm pessimistic that doing name resolution by rewriting trees at that scale will do that well. And that is mostly a linear time process, as opposed to iterative flow analyzers. – Ira Baxter Jul 08 '16 at 07:22
  • Would be happy to continue discussion offline; send me an email. – Ira Baxter Jul 08 '16 at 07:25