I was wondering if it is possible to import an object in a Matlab Level-2 S-function within Simulink.
In the past, I have applied reinforcement learning to dynamic models in Matlab. As a result, I have created some classes that deal with the generation and update of the policy. Now, I need to move to Simulink because I have a more complex dynamic system. I am familiar with C S-functions, but because I already have the Matlab code in two classes I was thinking about using a Matlab S-function that uses these objects.
My work flow is as follows: the main Matlab function where policy object is initialized calls the Simulink file with the dynamic model. In the S-function, the policy object is call to select an action (which is the output of the control system). The policy object (in fact its weights) is then updated in the main Matlab function after a number of simulations of the Simulink file.
So, what I would need is a way to import the policy
object in a Matlab S-function in Simulink. I have tried to import it as parameter, but only numeric values are accepted. I cannot keep the object only within the S-function (hence, initialising it within the initialisation function) because I need to update its weights in the main Matlab script.
Is this possible? Any suggestions would be greatly appreciated!
An example of the policy class is as follows:
classdef Policy
%% Accessible properties:
properties
a; % selected action index
actions; % actions list
basis; % type of basis function
centres; % list of centres of the RBFs
exploration_rate; % exploration rate
mu; % width of each RBF
nbasis; % no. basis functions overall
states; % list of discrete states
weights; % weights of the linear function approximation
end
%% Protected properties:
properties (Access = protected)
na; % no. actions
ns; % no. discrete states
nrbf; % no. radial basis functions per action
state; % current state
Q; % Q value for each action-state pair
end
%% Accessible methods:
methods %(Access = protected)
%% Initialization function:
function obj = Policy(actions,states,epsilon,basis,mu)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Input:
% actions: actions list
% states: states list or centres of the RBFs
% epsilon: initial exploration rate
% delta: discount factor
% basis: type of basis functions
% mu: width of each RBF
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<4
basis = 'basis_exact';
end
obj.actions = actions;
obj.states = states;
obj.exploration_rate = epsilon;
switch basis
case 'basis_exact'
obj.basis = basis;
obj.states = states;
obj.ns = size(states,1);
case 'basis_rbf'
obj.basis = basis;
obj.centres = states;
obj.mu = mu;
obj.nrbf = size(states,1);
otherwise
error(['Only exact and radial basis functions',...
'supported']);
end
end
%% Setter function for the features' weights:
function obj = set_weights(obj,weights)
obj.weights = weights;
end
%% Update the exploration rate with the given rate:
function obj = update_epsilon(obj,rate)
obj.exploration_rate = obj.exploration_rate*rate;
end
%% Select an action:
function obj = select_action(obj,state)
% Store the current state:
obj.state = state;
% Compute the state-action values for the current state:
obj = obj.qvalues();
% Get the current action with an epsilon-greedy policy:
obj.a = obj.eGreedy();
end
%% Evaluate the features:
function phi = get_features(obj,state,action)
% Store the current state:
obj.state = state;
% Get the features:
phi = feval(obj.basis,action);
end
end
%% Protected methods:
methods (Access=protected)
%% Find the discrete state:
function s = discretizeState(obj,x)
% Copy the row vector entries (continuous states) to all rows:
x = repmat(x,obj.ns,1);
% Select the row using the minimum Eucledian distance:
[~,s] = min(sum((obj.states-x).^2,2).^0.5);
end
%% Get the Q-value function for current state and action:
function q = qvalue(obj,action)
phi = feval(obj.basis,action);
q = phi' * obj.weights;
end
%% Get the Q-value functions for the current state:
function obj = qvalues(obj)
% Initialize the Q-values for the current state:
obj.Q = zeros(obj.na,1);
% Calculate the state-action values for the current state:
for a=1:obj.na
obj.Q(a) = obj.qvalue(a);
end
end
%% Get an action with an epsilon-greedy exploration policy:
function a = eGreedy(obj)
% Generate a random number:
r = rand;
% Select the action that maximises Q(s)
if (r>obj.exploration_rate)
[~,a] = max(obj.Q); % value, action
% Choose a random action:
else
a = randi(obj.na); % random integer based on a uniform
end % distribution
end
%% Find the features for the exact basis functions:
function phi = basis_exact(obj,action)
%Initialize the features:
phi = zeros(obj.nbasis,1);
% Find the current discrete state:
s = discretizeState(obj.state);
% Find the starting position of the block:
base = (action-1) * obj.ns;
% Set the indicator:
phi(base+s) = 1;
end
%% Find the features for the radial basis functions:
function phi = basis_rbf(obj, action)
%Initialize the features:
phi = zeros(obj.nbasis,1);
% Find the starting position:
base = (action-1) * (obj.nbasis/obj.na);
% This is because the matrix Theta is converted into a line
% vector
% Compute the RBFs:
for i=1:obj.nrbf
phi(base+i) = exp(-norm(obj.state-obj.centres(i,:))^2/...
(2*obj.mu));
end
% ... and the constant:
phi(base+obj.nrbf+1) = 1;
end
end
end