I'm currently running Bayesian Optimization, written in c++. I use a toolbox call Bayesopt from Ruben Martinez-Cantin (http://rmcantin.bitbucket.org/html/). I'm doing my thesis about Bayesian Optimization (https://en.wikipedia.org/wiki/Bayesian_optimization).
I had previously experimented with this toolbox and I have noticed this week that the code is running a lot slower than I remembered. It's worth mentioning that I did write some code that works with this toolbox.
I decided to try to understand why this was happening and I did witness that the code was running much slower than it should.
To try to understand if it was my code's fault or otherwise, I tried an example that doesn't use any of my code.
Consider the following example:
#include <iostream>
#include <bayesopt.hpp>
class ExampleMichalewicz: public bayesopt::ContinuousModel
{
public:
ExampleMichalewicz(bopt_params par);
double evaluateSample(const vectord& x);
bool checkReachability(const vectord &query) {return true;};
void printOptimal();
private:
double mExp;
};
ExampleMichalewicz::ExampleMichalewicz(bopt_params par):
ContinuousModel(10,par)
{
mExp = 10;
}
double ExampleMichalewicz::evaluateSample(const vectord& x)
{
size_t dim = x.size();
double sum = 0.0;
for(size_t i = 0; i<dim; ++i)
{
double frac = x(i)*x(i)*(i+1);
frac /= M_PI;
sum += std::sin(x(i)) * std::pow(std::sin(frac),2*mExp);
}
return -sum;
}
void ExampleMichalewicz::printOptimal()
{
std::cout << "Solutions: " << std::endl;
std::cout << "f(x)=-1.8013 (n=2)"<< std::endl;
std::cout << "f(x)=-4.687658 (n=5)"<< std::endl;
std::cout << "f(x)=-9.66015 (n=10);" << std::endl;
}
int main(int nargs, char *args[])
{
bopt_params par = initialize_parameters_to_default();
par.n_iterations = 20;
par.n_init_samples = 30;
par.random_seed = 0;
par.verbose_level = 1;
par.noise = 1e-10;
par.kernel.name = "kMaternARD5";
par.crit_name = "cBEI";
par.crit_params[0] = 1;
par.crit_params[1] = 0.1;
par.n_crit_params = 2;
par.epsilon = 0.0;
par.force_jump = 0.000;
par.verbose_level = 1;
par.n_iter_relearn = 1; // Number of samples before relearn kernel
par.init_method = 1; // Sampling method for initial set 1-LHS, 2-Sobol (if available),
par.l_type = L_MCMC; // Type of learning for the kernel params
ExampleMichalewicz michalewicz(par);
vectord result(10);
michalewicz.optimize(result);
std::cout << "Result: " << result << "->"
<< michalewicz.evaluateSample(result) << std::endl;
michalewicz.printOptimal();
return 0;
}
If I compile this example alone, the run time is about 23 seconds.
With this cmake file
PROJECT ( myDemo )
ADD_EXECUTABLE(myDemo ./main.cpp)
find_package( Boost REQUIRED )
if(Boost_FOUND)
include_directories(${Boost_INCLUDE_DIRS})
else(Boost_FOUND)
find_library(Boost boost PATHS /opt/local/lib)
include_directories(${Boost_LIBRARY_PATH})
endif()
include_directories(${PROJECT_SOURCE_DIR}/include)
include_directories("../bayesopt/include")
include_directories("../bayesopt/utils")
set(CMAKE_CXX_FLAGS " -Wall -std=c++11 -lpthread -Wno-unused-local-typedefs -DNDEBUG -DBOOST_UBLAS_NDEBUG")
target_link_libraries(myDemo libbayesopt.a libnlopt.a)
Now consider the same main example, but where I add three additional files to my cmake project (without including them in main.cpp). These three files are subpart of all my code.
PROJECT ( myDemo )
ADD_EXECUTABLE(myDemo ./iCubSimulator.cpp ./src/DatasetDist.cpp ./src/MeanModelDist.cpp ./src/TGPNode.cpp)
find_package( Boost REQUIRED )
if(Boost_FOUND)
include_directories(${Boost_INCLUDE_DIRS})
else(Boost_FOUND)
find_library(Boost boost PATHS /opt/local/lib)
include_directories(${Boost_LIBRARY_PATH})
endif()
include_directories(${PROJECT_SOURCE_DIR}/include)
include_directories("../bayesopt/include")
include_directories("../bayesopt/utils")
set(CMAKE_CXX_FLAGS " -Wall -std=c++11 -lpthread -Wno-unused-local-typedefs -DNDEBUG -DBOOST_UBLAS_NDEBUG")
target_link_libraries(myDemo libbayesopt.a libnlopt.a)
This time, the run time is about 3 minutes. This is critical in my work since if I increase par.n_iterations
it tends to get much worse.
I further arrived at the conclusion that if I comment a line in TGPNode.cpp
utils::cholesky_decompose(K,L); (NOTICE THAT THIS LINE IS NEVER CALLED).
I get the 23 seconds. This function belongs to a file: ublas_cholesky.hpp, from the bayesopt toolbox.
It is also important to note that the same function is also called within the toolbox code. This line is not commented and it runs during michalewicz.optimize(result);
.
Does anyone have any ideia why this is happening? It would be a great help if anyone has some insight about the subject.
Greatly appreciated.
Kindly, José Nogueira