Derivative of a Higher-Order Function

Question

This is in the context of Automatic Differentiation - what would such a system do with a function like map, or filter - or even one of the SKI Combinators?

Example: I have the following function:

def func(x):
    return sum(map(lambda a: a**x, range(20)))

What would its derivative be? What will an AD system yield as a result? (This function is well-defined on real-number inputs).

Edward Kmett · Answer 1 · 2012-06-26T02:42:07.090

I would like to disagree with the accepted answer that you cannot usefully differentiate higher rank functions, or that you need to restrict yourself to a particularly small subset of them by demonstrating it in practice.

Using my Haskell 'ad' package, we get:

ghci> :m + Numeric.AD
ghci> diff (\x -> sum (map (**x) [1..20])) 10
7.073726805128313e13

We can extract what was done by abusing a traced numeric type to answer your question of the derivative is:

ghci> :m + Debug.Traced
ghci> putStrLn $ showAsExp $ diff (\x -> sum (map (**x) [1..20])) (unknown "x" :: Traced Double) 
1.0 * (1.0 ** x * log 1.0) + 
1.0 * (2.0 ** x * log 2.0) +
1.0 * (3.0 ** x * log 3.0) +
1.0 * (4.0 ** x * log 4.0) +
1.0 * (5.0 ** x * log 5.0) +
1.0 * (6.0 ** x * log 6.0) +
1.0 * (7.0 ** x * log 7.0) +
1.0 * (8.0 ** x * log 8.0) +
1.0 * (9.0 ** x * log 9.0) +
1.0 * (10.0 ** x * log 10.0) +
1.0 * (11.0 ** x * log 11.0) +
1.0 * (12.0 ** x * log 12.0) +
1.0 * (13.0 ** x * log 13.0) +
1.0 * (14.0 ** x * log 14.0) +
1.0 * (15.0 ** x * log 15.0) +
1.0 * (16.0 ** x * log 16.0) +
1.0 * (17.0 ** x * log 17.0) +
1.0 * (18.0 ** x * log 18.0) +
1.0 * (19.0 ** x * log 19.0) +
1.0 * (20.0 ** x * log 20.0)

With full sharing you get the rather more horrific looking results, that are sometimes asymptotically more efficient.

ghci> putStrLn $ showAsExp $ reShare $ diff (\x -> sum (map (**x) [1..20])) 
      (unknown "x" :: Traced Double)
let _21 = 1.0 ** x;
    _23 = log 1.0;
    _20 = _21 * _23;
    _19 = 1.0 * _20;
    _26 = 2.0 ** x;
    _27 = log 2.0;
    _25 = _26 * _27;
    _24 = 1.0 * _25;
    _18 = _19 + _24;
    _30 = 3.0 ** x;
    _31 = log 3.0;
    _29 = _30 * _31;
    _28 = 1.0 * _29;
    _17 = _18 + _28;
    _34 = 4.0 ** x;
    _35 = log 4.0;
    _33 = _34 * _35;
    _32 = 1.0 * _33;
    _16 = _17 + _32;
    _38 = 5.0 ** x;
    _39 = log 5.0;
    _37 = _38 * _39;
    _36 = 1.0 * _37;
    _15 = _16 + _36;
    _42 = 6.0 ** x;
    _43 = log 6.0;
    _41 = _42 * _43;
    _40 = 1.0 * _41;
    _14 = _15 + _40;
    _46 = 7.0 ** x;
    _47 = log 7.0;
    _45 = _46 * _47;
    _44 = 1.0 * _45;
    _13 = _14 + _44;
    _50 = 8.0 ** x;
    _51 = log 8.0;
    _49 = _50 * _51;
    _48 = 1.0 * _49;
    _12 = _13 + _48;
    _54 = 9.0 ** x;
    _55 = log 9.0;
    _53 = _54 * _55;
    _52 = 1.0 * _53;
    _11 = _12 + _52;
    _58 = 10.0 ** x;
    _59 = log 10.0;
    _57 = _58 * _59;
    _56 = 1.0 * _57;
    _10 = _11 + _56;
    _62 = 11.0 ** x;
    _63 = log 11.0;
    _61 = _62 * _63;
    _60 = 1.0 * _61;
    _9 = _10 + _60;
    _66 = 12.0 ** x;
    _67 = log 12.0;
    _65 = _66 * _67;
    _64 = 1.0 * _65;
    _8 = _9 + _64;
    _70 = 13.0 ** x;
    _71 = log 13.0;
    _69 = _70 * _71;
    _68 = 1.0 * _69;
    _7 = _8 + _68;
    _74 = 14.0 ** x;
    _75 = log 14.0;
    _73 = _74 * _75;
    _72 = 1.0 * _73;
    _6 = _7 + _72;
    _78 = 15.0 ** x;
    _79 = log 15.0;
    _77 = _78 * _79;
    _76 = 1.0 * _77;
    _5 = _6 + _76;
    _82 = 16.0 ** x;
    _83 = log 16.0;
    _81 = _82 * _83;
    _80 = 1.0 * _81;
    _4 = _5 + _80;
    _86 = 17.0 ** x;
    _87 = log 17.0;
    _85 = _86 * _87;
    _84 = 1.0 * _85;
    _3 = _4 + _84;
    _90 = 18.0 ** x;
    _91 = log 18.0;
    _89 = _90 * _91;
    _88 = 1.0 * _89;
    _2 = _3 + _88;
    _94 = 19.0 ** x;
    _95 = log 19.0;
    _93 = _94 * _95;
    _92 = 1.0 * _93;
    _1 = _2 + _92;
    _98 = 20.0 ** x;
    _99 = log 20.0;
    _97 = _98 * _99;
    _96 = 1.0 * _97;
    _0 = _1 + _96;
in  _0

In general automatic differentiation has no problem with higher rank functions. Source-to-source translations may run into a few gotchas depending on the limitations of the particular tool, however.

score 3 · Answer 2 · answered Apr 30 '10 at 19:14

A good AD system will breeze through that without any difficulty. If the AD code performs source to source translation then it may have trouble with the sum. But if it's an AD system that works by overloading the arithmetic operators then the AD code won't actually 'see' the sum function, just the + operations that the sum function calls.

Barry Kelly · Accepted Answer · 2008-11-27T11:41:38.217

1

Higher-order functions are discrete. They don't have the Cartesian quality of having arguments that have well-defined mappings to points in some n-dimensional space.

However, with your clarification on the answer, there are several things that can be said. Symbolically differentiating some higher-order functions would be possible, but only for certain calling patterns that resolve to well-known functions.

Probably, numerical differentiation would be more fruitful, as it can estimate the derivative by repeatedly evaluating the given function.

Also, functions that are totally general - and your example is heading that way, with use of relatively arbitrary functions - eventually you'll hit Turing completeness, which will mean that no amount of cleverness on the part of a symbolic differentiator will be able to automatically differentiate the function.

edited Nov 27 '08 at 11:41

answered Nov 26 '08 at 15:28

Barry Kelly

41,404
5
117
189

take a look at the edit - i posted an example. this is for my own curiosity, but i can see systems which need derivatives which would use higher-order functions. – Claudiu Nov 26 '08 at 15:35
hm... i think you have a flawed argument about turing completeness, there. there are transformations you can do to any arbitrary code - this doesn't mean they are bounded by turing completeness. – Claudiu Nov 28 '08 at 21:38
Claudiu, Turing completness is a hard bound that prevents you from knowing everything about functions in a sufficiently expressive language. If you have a function which doesn't even return a value (i.e. does not terminate), how can you expect to get its slope at a point? – Barry Kelly Nov 28 '08 at 23:39
Turing completeness is basically the hard limit to automatic analysis of code. What would the slope of a function that never returns mean, anyway? In what way would the derivative be correct? – Barry Kelly Nov 28 '08 at 23:41
2

Higher order functions are not inherantly discrete: they are just functions that maps functions to functions. Indeed, the derivative itself is a perfectly valid higher order function. I also don't know what you mean by the "Cartesian quality of having arguments that have well-defined mappings to points in some n-dimensional space". Every function has a well defined mapping from it's arguments to an output domain - that's what defines a function. The problem here relates more to defining a derivative for the function space; it's not clear what the derivative of a list of items would even be. – Adam Wright Apr 30 '10 at 19:31
@Adam - what I meant was that functions aren't points in some continuous space. What do you get when you subtract one function from another, and divide it by the distance between the functions? What does that even mean? That's what I meant by discrete, as opposed to continuous locations in some space, where distance has a meaning and slope can be measured. – Barry Kelly May 01 '10 at 01:43
1

Sure they are - e.g., they space of continuous functions is Banach (i.e. a topological space, on which continuity is well defined). As they are Banach, they come complete with a norm allowing us to define concepts of "distance" in some way - and even a notion of derivative (in this example, the Frechet derivative). I'll agree that these are pretty abstract definitions, and not what the poster wanted, but circles back to my point - getting the definitions right, i.e. what property does the poster expect of his "derivative"? – Adam Wright May 01 '10 at 21:14
@Adam - and that is useful precisely to the degree that map, sum etc. are continuous functions. In other words, I'm not sure what you're trying to prove, apart from being unhelpful. – Barry Kelly May 02 '10 at 00:39
1

I'm trying to help you update your answer without jumping in and editing it (the entire first paragraph is just mathematically wrong), and the questioner update his question with more of what he actually wants. – Adam Wright May 02 '10 at 11:35
Just to clarify, I'm not disagreeing with the general thrust of your answer - just that, as it's accepted, a few specifics could be clearer. I would edit it, but I don't like altering other peoples answers. – Adam Wright May 02 '10 at 11:56

Derivative of a Higher-Order Function

3 Answers3