26

I wrote an anonymous factorial function in C++ and compiled my code with g++4.9.2. It works well. However, I don't know the type of my function.

#include<iostream>
#include<functional>
using std::function;
int main()
{
    //tested at g++ 4.9.2
    //g++ -std=c++1y -o anony anony.cpp
    auto fac = [](auto self,auto n)->auto{
        if(n < 1)
            return 1;
        else 
            return n * self(self,n-1);
    };
    std::cout<<fac(fac,3)<<std::endl;//6
    return 0;
}

So, I wonder: what are the types of fac and self? If I just translate the C++ code into Haskell, it won't compile because it involves infinite types:

fac2 self 0 = 1
fac2 self n = n * (self self $ n-1)

and I have to define some recursive type work around it:

data Y a = Y ((Y a)->a->a)
fac2 self 0 = 1
fac2 self n = n * ((applY self self) (n-1))
    where applY (Y f1) f2 = f1 f2
fact2 = fac2 $ Y fac2

So, why could g++ get exactly the right type of the fac function, and what type does g++ think the fac function is?

chi
  • 111,837
  • 3
  • 133
  • 218
Alaya
  • 3,287
  • 4
  • 27
  • 39
  • when you replace `auto` with some type e.g. `int` compiler should tell you that it cannot infer types and give they names. But I haven't tested it – janisz Jan 07 '15 at 07:55
  • but why could g++ infer the right type of my fac function? – Alaya Jan 07 '15 at 07:57
  • 1
    `fac` in this is a generic lambda, which acts like a functor with a templated `operator()`. Note that these are new for c++14. – user657267 Jan 07 '15 at 07:58
  • Here is nice answer that might be helpful [How does generic lambda work in C++14?](http://stackoverflow.com/a/17233649/1387612) – janisz Jan 07 '15 at 08:22
  • 1
    As a side note, a better way to write the Haskell version is to split it into the recursive bit (namely and generically, `fix :: (a -> a) -> a` with `fix f = f (fix f)`) and the non-recursive bit. The non-recursive bit is `fact1 recur n = if n == 0 then 1 else n * recur (n-1)` and you'll notice that this (a) is not recursive, (b) has a finite and inferable type `fact1 :: (Int -> Int) -> (Int -> Int)`, and (c) implements a single "step" of factorial. We then just "complete" it by combining the recursive and non-recursive bits: `fact = fix fact1`. – J. Abrahamson Jan 07 '15 at 14:50
  • @J.Abrahamson you mean the Y combinator. I know that – Alaya Jan 07 '15 at 16:15
  • 2
    Well, I would not call `fix` the Y combinator. It has the same effect, but there are many fixed point combinators of which `Y` and `fix` are just two examples. – J. Abrahamson Jan 07 '15 at 20:59
  • This isn't so relevant, but you can write your `Y` type as a `newtype` instead of `data`, which leads to a more efficient runtime representation. – dfeuer Feb 04 '15 at 17:05
  • Also note that there is a pre-made version of `fix` in `Data.Function` defined as `fix f = let x = f x in x`. I *think* this is designed to save memory and/or time in certain situations, but I don't know the details. – dfeuer Feb 04 '15 at 17:09
  • 1
    Oh yes, I remember now: the definition in `Data.Function` makes `fix (1:)` a circularly linked list instead of an infinite list, and similarly for some other such things. It lets you tie data structures up in knots, whereas the more obvious `fix f = f (fix f)` does not. – dfeuer Feb 04 '15 at 17:17

3 Answers3

26

The C++ fac isn't really a function, but a struct which has a member function.

struct aaaa // Not its real name.
{
    template<typename a, typename b>
    auto operator()(a self, b n) const
    { 
    }
};

The overloaded call operator hides some of the trickery that C++ performs in order to implement "lambda functions"

When you "call" fac, what happens is

fac.operator() (fac, 3);

so the argument to the function isn't the function itself, but an object which has it as a member.
One effect of this is that the function's type (i.e. the type of operator()) does not occur in the type of the operator() function itself.
(The type of self is the struct that defines the function.)

The template part isn't necessary for this to work; this is a non-generic version of the fac "function":

struct F
{
    int operator()(const F& self, int n) const
    { 
        // ...
    }
};

F fac;
fac(fac, 3);

If we keep the template and rename operator() to applY:

// The Y type
template<typename a>
struct Y
{
    // The wrapped function has type (Y<a>, a) -> a
    a applY(const Y<a>& self, a n) const
    { 
        if(n < 1)
            return 1;
        else 
            return n * self.applY(self, n-1);
    }
};

template<typename a>
a fac(a n)
{
    Y<a> y;
    return y.applY(y, n);
}

we see that your working Haskell program and your C++ program are very similar - the differences are mainly punctuation.

In contrast, in Haskell

fac2 self 0 = 1
fac2 self n = n * (self self $ n-1)

self is a function, and fac2's type would have to be

X -> Int -> Int

for some X.
Since self is a function, and self self $ n-1 is an Int, self's type is also X -> Int -> Int.

But what could X be?
It must be the same as the type of self itself, i.e X -> Int -> Int.
But that means that the type of self is (substituting for X):

(X -> Int -> Int) -> Int -> Int  

so the type X must also be

(X -> Int -> Int) -> Int -> Int  

so self's type must be

((X -> Int -> Int) -> Int -> Int) -> Int -> Int

and so on, ad infinitum.
That is, in Haskell the type would be infinite.

Your solution for Haskell essentially explicitly introduces the necessary indirection that C++ generates through its structure with a member function.

molbdnilo
  • 64,751
  • 3
  • 43
  • 82
15

As others pointed out, the lambda acts as a structure involving a template. The question then becomes: why Haskell can not type the self-application, while C++ can?

The answer lies on the difference between C++ templates and Haskell polymorphic functions. Compare these:

-- valid Haskell
foo :: forall a b. a -> b -> a
foo x y = x

// valid C++
template <typename a, typename b>
a foo(a x, b y) { return x; }

While they might look nearly equivalent, they are not really such.

When Haskell type checks the above declaration, it checks that the definition is type safe for any types a,b. That is, if we substitute a,b with any two types, the function must be well-defined.

C++ follows another approach. At template definition, it is not checked that any substitution for a,b will be correct. This check is deferred to the point of use of the template, i.e. at instantiation time. To stress the point, let's add a +1 in our code:

-- INVALID Haskell
foo :: forall a b. a -> b -> a
foo x y = x+1

// valid C++
template <typename a, typename b>
a foo(a x, b y) { return x+1; }

The Haskell definition will not type check: there's no guarantee you can perform x+1 when x is of an arbitrary type. The C++ code is fine, instead. The fact that some substitutions of a lead to incorrect code is irrelevant right now.

Deferring this check causes some "infinitely-typed values" to be allowed, roughly. Dynamic languages such as Python or Scheme further defer these type errors until run-time, and of course will handle self-application just fine.

chi
  • 111,837
  • 3
  • 133
  • 218
6

The expression following auto fac = is a lambda expression, and the compiler will automatically generate a closure object from it. The type of that object is unique and known only to the compiler.

From N4296, §5.1.2/3 [expr.prim.lambda]

The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed non-union class type — called the closure type — whose properties are described below. This class type is neither an aggregate (8.5.1) nor a literal type (3.9). The closure type is declared in the smallest block scope, class scope, or namespace scope that contains the corresponding lambda-expression.

Note that because of this, even two identical lambda expressions will have distinct types. For example,

auto l1 = []{};
auto l2 = []{}; // l1 and l2 are of different types

Your lambda expression is a C++14 generic lambda, and will be translated by the compiler to a class that resembles the following:

struct __unique_name
{
    template<typename Arg1, typename Arg2>
    auto operator()(Arg1 self, Arg2 n) const
    { 
        // body of your lambda
    }
};

I cannot comment on the Haskell part, but the reason the recursive expression works in C++ is because you're simply passing a copy of the closure object instance (fac) in each call. The operator() being a template is able to deduce the type of the lambda even though it is not one you can name otherwise.

Praetorian
  • 106,671
  • 19
  • 240
  • 328