3

min_{a*x=y} +lambda*norm(\hat{a},1) is the objective function where a is the vector of coefficients, y denotes the noisy measurements and x is the unobserved input signal. I am aware of lasso() function but I don't prefer to use the inbuilt function since this will not help me to understand the steps. Can somebody help in implementing the l1 norm optimization?

Mathematically, my model is expressed as a moving average (MA) system: y[k] = a_1*x[k] + a_2*x[k-1] + a_{10}*x[k-9] + n[k] where n ~ N(0,\sigma^2) is Additive White Gaussian Noise and x is a zero mean Gaussian white process of unity variance and a_1,a_2,...,a_10 are the coefficients of the MA model which are known to be sparse. However, I don't know the position of the sparse coefficients.

In this model only 3 coefficients are non-zero whereas the rest are all zero or close to zero. One approach of doing parameter estimation is to construct an inverse filter or also known as the minimizing the prediction error.

By inverse filtering approach, I can create an inverse filter for the MA model which is expressed by: u[k] = x[k]-(\hat{a_2}*x[k-1]+ \hat{a_3}*x[k-3] + \hat{a_{4}}*x[k-4] +\ldots+\hat{a_{10}}*x[k-9] ).

Therefore, the objective function becomes: J = min_{\hat{a}*x=y} +lambda*norm(\hat{a},1) wherey is the observed noisy measurements and \hat{a}*x is the clean. Let \mathbf{\hat{a}} = {[\hat{a_1},\ldots,\hat{a_{10}}]}^T represent the estimated coefficient vector.

My approach is to split the objective function J into 2 parts--the first part being the OLS estimate which is fed into an l1 minimization routine. The output of the l1 minimization gives the sparse coefficients. Is the approach legitimate? If so, I need help on what is the l1 optimizer in Matlab?

The following is the code snippet where I have created the model. But I don't know how to solve the objective function. Please help.

%Generate input
N=500;
x=(randn(1,N)*100);
L = 10;
Num_lags = 1:L-1;
a = 1+randn(L,1);
%Data preparation into regressors    
a(rand(L,1)<.9)=0; % 90 of the coefficients are zero
X1 = lagmatrix(x, [0 Num_lags]);
Srishti M
  • 533
  • 4
  • 21
  • Emmm, It is about 2 months since I read lasso paper, but it seems that solving lasso is a quadratic programming with linear constraints. I can find my code in that if you want, does that help? – Hunter Jiang Feb 17 '18 at 06:44

2 Answers2

2

Due to the lasso paper, we have:

enter image description here

It's a quadratic programming with linear constraints , and there is a lasso parameter t>=O which controls the amount of shrinkage that is applied to the estimates. In your case, we could assume that t=0.1*sum(beta0). (beta0 is full least squares estimates; according to the conclusion on page 3)


Edit: assume that we have a lasso equation (1) with 2 parameters.

  1. Eq(1): argmin{(y(1)-alpha-beta_1*x(1,1)-beta_2*x(2,1))^2+(y(2)-alpha-beta_1*x(1,2)-beta_2*x(2,2))^2} s.t. sum(abs(beta))<=t `
  2. due to hat(alpha)=mean(y), we have Eq(1) ' : argmin{(hat(y(1))-beta_1*x(1,1)-beta_2*x(2,1))^2+(hat(y(2))-beta_1*x(1,2)-beta_2*x(2,2))^2} s.t. sum(abs(beta))<=t by defining hat(y)=y-mean(y).
  3. (1) Eq(1) ' could rewrite into argmin{(beta)'H(beta)+f(beta)+C} and the solution of it is equals to argmin{(beta)'H(beta)+f(beta)}
  4. (2) the s.t. part is equals to 2^(length(beta)) constraints (Ax=b), and A is a p-tuples (e.g. (-1,-1),(-1,1),(1,-1),(1,1) in 2 parameters case) with b=(t,t)'.

Then we could solve it by quadprog in MATLAB with:

  • calculate the matrix H,f by turning lasso function into a binomial type.

  • let delta_i ,i= 1, 2, ... , 2P be the p-tuples of the form (+-1 ,+-1, ...,+-1), we could rewrite our constraints into Ax<b(delta_i*beta<=t).

  • solving the lasso estimate by calling the function quadprog(H,f,A,b).

Here is my code:

clc; clear;
%Generate input
N=500;
Bnum=10;
x=(randn(N,Bnum)*1000);
true_beta = 1+randn(Bnum,1);
y=x*true_beta;

%find the solution of lm & t
%lm solution beta0
ls_beta=(x'*x)\(x'*y);
%define t
t=0.1*sum(abs(ls_beta));

%%solving the quadratic programming
%calc H
H=zeros(Bnum);
for ii=1:Bnum
    for ij=1:Bnum
      H(ii,ij)=sum(2*x(:,ii).*x(:,ij));
    end
end
H;
%calc f
f=zeros(Bnum,1);
yhat=y-mean(y);
for ii=1:Bnum
  f(ii)=sum(-2*yhat.*x(:,ii));
end
f;
%calc A
A=zeros(power(2,Bnum),Bnum);
for ii=1:power(2,Bnum)
  v=dec2bin(ii-1,Bnum);
  for ij=1:Bnum
    A(ii,ij)=(str2num(v(ij))-0.5)*2;
  end
end
%calc b
b=ones(power(2,Bnum),1)*t;
%calc quadprog
lasso_beta=quadprog(H,f,A,b);
[true_beta ls_beta lasso_beta]

we will have an answer ([true_beta ls_beta lasso_beta]) like this:

ans =

    0.5723    0.5723    0.0000
    0.4206    0.4206    0.0000
    1.9260    1.9260    0.1109
    1.0055    1.0055    0.0000
    0.3655    0.3655    0.0000
    1.8583    1.8583    0.1582
    0.5192    0.5192    0.0000
    2.4897    2.4897    0.7242
    0.3706    0.3706   -0.0000
    0.4060    0.4060    0.0000

p.s all the Emphasis come from the lasso paper.

Hope it helps!

Hunter Jiang
  • 1,300
  • 3
  • 14
  • 23
  • If you have to set 90% of coefficients to 0, try `Best subset selection` instead. – Hunter Jiang Feb 17 '18 at 09:32
  • Thank you for your answer but I am having some difficulties to understand, could you please clarify a bit?(1) what do the variables `H,f` stand for?Are the variables `A`,`b` the variables alpha,beta in Eq(1)?(2)I did not understand your comment `Best subset selection`. I need to use this algoritm when large number of coefficients are zero. In its current form, its not clear if the code does work for sparse coefficients.Can you please elaborate what you mentioned in your comment? – Srishti M Feb 17 '18 at 19:54
  • (4)Which part solves only the `l1` norm?Can I split the objective function in 2 parts?The first part can be solved by the OLS and then the second part using `l1` (`lambda*norm(\hat{a},1)`) and then add the two parts together?Or am I misinterpreting the objective function which is the `+` sign does not mean adding the two parts -- OLS + l1 – Srishti M Feb 17 '18 at 19:54
  • I cannot understand the lines `%calc b b=ones(power(2,Bnum),1)*t;`. Therefore, Is it possible to modify your code so that once the coefficients are found after OLS estimation, I can pass them to the `l1` optimization routine? If so, then how to only use the `l1 ` optimization which takes in input the answer from OLS estimation and returns sparse coefficients as the output? – Srishti M Feb 17 '18 at 20:01
  • Updated my post, it is really a long story. Note that I also changed my code a little(there was a little bug, sorry). Hope it helps!!!!!!!!@SrishtiM – Hunter Jiang Feb 18 '18 at 01:53
  • and the `best subset selection` part, I mean lasso can only control the ratio of sparse by changing the parameter `t`, and 0.1*beta0 is `roughly` equals to 90% of the coefficients are zero. – Hunter Jiang Feb 18 '18 at 01:56
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/165350/discussion-between-hunter-jiang-and-srishti-m). – Hunter Jiang Feb 18 '18 at 01:58
2

The code below could solve l1-optimization argmin{f(x)} s.t.||x||_1<=t.

Edit:Updated a typo in @V(ref. to the comment from SKM)

clc; clear;
%Generate input data
N=500;
Bnum=10;
X=(randn(N,Bnum)*1000);
true_beta = rand(Bnum,1);
Y=X*true_beta+rand(N,1);

%solve lasso using fminunc
lamda=1;
V = @(x) norm(Y-X*x)^2+lamda*norm(x,1);
options=optimoptions('fminunc','Algorithm','quasi-newton','Display','iter');
xopt = fminunc(V,zeros(Bnum,1),options)

However, I still recommend the code in the second post, which uses QUADPROG function. It will be much faster and more accurate.

Ref

Hunter Jiang
  • 1,300
  • 3
  • 14
  • 23
  • (1) ((0,1)-0.5)x2=(-1,1) (2)no, I set it to 10, it depends on your specific function. (3)yes – Hunter Jiang Feb 18 '18 at 11:22
  • Post updated, and here comes a good news: no more `A` or `b` if you use an inline function. However, it may slower than `quadprog`. @SrishtiM – Hunter Jiang Feb 21 '18 at 05:18
  • Nope. Sorry for ugly variable names. `x` is `beta` here, so `norm(Y-X*x)^2` is the square errors, and `norm(x,1)` is the l1 constraint. – Hunter Jiang Feb 22 '18 at 01:13
  • @HunterJiang: I think there is a typo in your code in the line `V = @(x) norm(Y-X*x)^2+norm(X,1);` The correct should be `V = @(x) norm(Y-X*x)^2+norm(x,1);` since we are solving for small `x` or the `true_beta` which is sparse and not `X` Therefore the norm should be over small `x` (norm(x,1)) and not `norm(X,1)` which you wrote. – SKM Mar 01 '18 at 01:40
  • @SKM Indeed it is, sorry for it. – Hunter Jiang Mar 01 '18 at 05:28