Julia: Optimize a cost function with `Optim.jl` and `autodiff` for integers

Question

I like to optimize (minimize) the following given function (quad_function) by using Optim.jl with automatic differentiation (autodiff=true).

My objective function rounds Real values to whole numbers and is therefore step-like.

As I use the autodiff option, my Real values get dual numbers (ForwardDiff.Duals). But unfortunately there is no round function implemented for the ForwardDiff.Dual type. Thus I have written a roundtoint64 function, which extracts the real part. This approach causes problems during the optimization.

using Plots
plotlyjs()

function roundtoint64(x)
  if typeof(x) <: ForwardDiff.Dual
    roundtoint64(x.value)
  else
    Int64(round(x))
  end
end

function quad_function(xs::Vector)
  roundtoint64(xs[1])^2 + roundtoint64(xs[2])^2
end

x, y = linspace(-5, 5, 100), linspace(-5, 5, 100)
z = Surface((x,y)->quad_function([x,y]), x, y)
surface(x,y,z, linealpha = 0.3)

This is how my quad_function looks like:

The problem is, that the optimize functions converges immediately and does not proceed.

using Optim

res = Optim.optimize(
  quad_function,
  [5.0,5.0],
  Newton(),
  OptimizationOptions(
    autodiff   = true,
    # show_trace = true
  ))

result:

Results of Optimization Algorithm
 * Algorithm: Newton's Method
 * Starting Point: [5.0,5.0]
 * Minimizer: [5.0,5.0]
 * Minimum: 5.000000e+01
 * Iterations: 0
 * Convergence: true
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
   * |g(x)| < 1.0e-08: true
   * Reached Maximum Number of Iterations: false
 * Objective Function Calls: 1
 * Gradient Calls: 1


optimal_values  = Optim.minimizer(res) # [5.0, 5.0]
optimum         = Optim.minimum(res)   # 50.0

I also tried to initialize the optimize function with a vector of integers [5,5] to avoid rounding things, but that causes also problems finding the initial step size in:

ERROR: InexactError()
 in alphainit(::Int64, ::Array{Int64,1}, ::Array{Int64,1}, ::Int64) at /home/sebastian/.julia/v0.5/Optim/src/linesearch/hz_linesearch.jl:63
 in optimize(::Optim.TwiceDifferentiableFunction, ::Array{Int64,1}, ::Optim.Newton, ::Optim.OptimizationOptions{Void}) at /home/sebastian/.julia/v0.5/Optim/src/newton.jl:69
 in optimize(::Function, ::Array{Int64,1}, ::Optim.Newton, ::Optim.OptimizationOptions{Void}) at /home/sebastian/.julia/v0.5/Optim/src/optimize.jl:169

Question: Is there a way to tell optimize to only explore the integer space?

Update: I think the problem with the converting approach to Int64 is that I no longer have ForwardDiff.Duals and thus can't compute any derivatives/gradients. How could a better round function look like, which rounds also nested duals and gives back duals?

The algorithm stops because the gradient is zero. Your problem is not really suited for a solver expecting a smooth function. For your function try something like a MIQP (mixed integer quadratic programming) solver. — Erwin Kalvelagen, Nov 25 '16 at 15:41
Thank you @ErwinKalvelagen I'll check that out. But do you think there is a way in general, to achieve this kind of mapping I try do do, unless the solver is not suited for it? Because it is acutually not that far away from smooth function in my persperctive... — swiesend, Nov 25 '16 at 15:58
It is not smooth at all, and then you have all these regions with zero gradients where a local nlp solver will may decide to stop. This is really the wrong technology for an essentially discrete problem. — Erwin Kalvelagen, Nov 25 '16 at 16:17
I understand. But is the smoothness the problem or the zero gradient? If I would give it a small gradient in each region how would that work? `function quad_function(xs::Vector) round(xs[1])*xs[1] + round(xs)[2]*xs[2] end`. Then its still not smooth, but at least I would have a gradient. — swiesend, Nov 25 '16 at 16:21
Two issues: you need a nonsmooth global solver while you are using a smooth local solver. So two penalties on your score card. But note that the relaxation of your problem is convex. So a local MIQP solver will do. — Erwin Kalvelagen, Nov 25 '16 at 16:33
Thanks you for your patience! I never did non-linear problem solving before, thus I still have to learn about many concepts and different solvers. — swiesend, Nov 25 '16 at 16:37
There is no support for integer programming in Optim. Maybe try JuMP with a suitable backend. — pkofod, Nov 26 '16 at 20:26
Didn't know that [JuMP](https://github.com/JuliaOpt/JuMP.jl) supports that many [backends](http://jump.readthedocs.io/en/latest/installation.html#getting-solvers) and the DSL seems to be straight forward. Thanks @pkofod! — swiesend, Nov 27 '16 at 14:42

score 2 · Answer 1 · answered Nov 25 '16 at 20:04

I'll respond to your update with a more dual-numbers-centric answer, since Erwin Kalvelagen beat me to the punch on the original question.

There is, in fact, a round function implemented for ForwardDiff.Dual which has the behavior you mentioned in your original post - it truncates the partial derivative components and only applies round to the real component. This is a mostly correct definition because the derivative of round is zero almost everywhere and undefined where the step occurs (i.e. at intervals of 0.5).

This definition could be made "more correct" by propagating NaNs or erroring at points where the derivative is undefined, but there's not much benefit to that strategy from the standpoint of practical AD. The round method will "pick a side" in the case of the discontinuity, so it makes sense for us to hand-wavily "pick a side" when taking the derivative. This is easy in the case of round, since the derivative on either side of the discontinuity is zero.

You can use any definition you'd like by overwriting the currently defined method, but as you pointed out, rounding the intermediary partial derivatives can yield an incorrect overall derivative. This is essentially due to the fact that you are no longer differentiating the same objective function.

swiesend · Answer 2 · 2016-11-27T14:38:21.250

As Erwin Kalvelagen pointed out in the comments of my question: There is no solution to this kind of problem with the given algorithm and solver.

Thus I altered my cost function a bit to have at least some gradients, but which is still not smooth:

function quad_function_with_gradient(xs::Vector)
  round(xs[1])*xs[1] + round(xs)[2]*xs[2]
end

which looks like this:

But then I still had to fix the dual numbers rounding problem. Therefore I wrote a round function, which rounds always the real part and the partials:

using Optim

roundpartials{T<:ForwardDiff.Partials}(partials::T) = (round(v) for v in partials.values)

Base.round{T<:ForwardDiff.Dual}(dual::T) = ForwardDiff.Dual(
  round(dual.value),
  roundpartials(dual.partials)...)

This gives me a solution, but to a slightly differnt problem:

res = Optim.optimize(
  quad_function,
  [5.0,5.0],
  Newton(),
  OptimizationOptions(
    autodiff   = true
  ))

Results of Optimization Algorithm
 * Algorithm: Newton's Method
 * Starting Point: [5.0,5.0]
 * Minimizer: [0.0,0.0]
 * Minimum: 0.000000e+00
 * Iterations: 1
 * Convergence: true
   * |x - x'| < 1.0e-32: false
   * |f(x) - f(x')| / |f(x)| < 1.0e-32: false
   * |g(x)| < 1.0e-08: true
   * Reached Maximum Number of Iterations: false
 * Objective Function Calls: 5
 * Gradient Calls: 5

It's up to you guys to decide if this is a feasible solution.

Julia: Optimize a cost function with `Optim.jl` and `autodiff` for integers

2 Answers2