The reason why this matrix is never computed is that it is very inefficient to compute the DWT using it. The FWT approach is much faster.
For a signal of length 16 and a 3-level haar transform, I found that this matrix in matlab
>> h=[1 1];
>> g=[1 -1];
>> m1=[[ones(1,8) zeros(1,8); ...
zeros(1,8) ones(1,8); ...
1 1 1 1 -1 -1 -1 -1 zeros(1,8); ...
zeros(1,8) 1 1 1 1 -1 -1 -1 -1]/sqrt(8); ...
[1 1 -1 -1 zeros(1,12); ...
zeros(1,4) 1 1 -1 -1 zeros(1,8); ...
zeros(1,8) 1 1 -1 -1 zeros(1,4); ...
zeros(1,12) 1 1 -1 -1]/sqrt(4); ...
[g zeros(1,14); ...
zeros(1,2) g zeros(1,12); ...
zeros(1,4) g zeros(1,10); ...
zeros(1,6) g zeros(1,8); ...
zeros(1,8) g zeros(1,6); ...
zeros(1,10) g zeros(1,4); ...
zeros(1,12) g zeros(1,2); ...
zeros(1,14) g]/sqrt(2)]
m1 =
A A A A A A A A 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 A A A A A A A A
A A A A -A -A -A -A 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 A A A A -A -A -A -A
B B -B -B 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 B B -B -B 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 B B -B -B 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 B B -B -B
C -C 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 C -C 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 C -C 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 C -C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 C -C 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 C -C 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 C -C 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C -C
where A=1/sqrt(8)
, B=1/sqrt(4)
and C=1/sqrt(2)
.
corresponds to the FWT. That shows you how you build your matrix from the filters. You start with the bottom half of the matrix --a matrix of zeroes, putting filter g
2 steps further every row. then make the filter twice as wide and repeat, only now shift 4 steps at a time. repeat this until you are at the highest level of decomposition, the finally put the approximation filter in at the same width (here, 8).
just as a check
>> signal=1:16; % ramp
>> [h g]=daubcqf(2); % Haar coefficients from the Rice wavelet toolbox
>> fwt(h,signal,3) % fwt code by Jeffrey Kantor
>> m1*signal' % should produce the same vector
Hope that helps you writing it in C++. It is not difficult (a bit of bookkeeping) but as said, noone uses it because efficient algorithms do not need it.