The problem with tiff is that it does not support inter-component decorrelation in its baseline. There are some extensions (not very broadly supported) that allow storing other file compression formats (such as a complete JPEG2000 JP2 file, extension 0x8798), but it is not guaranteed that an standard decoder will process it correctly.
If you can use any tool you want, close to optimal coding performance is probably obtained with a good spectral decorrelation transform (the KLT for lossy compression and the RKLT for lossless compression - see http://gici.uab.cat/GiciWebPage/downloads.php#spectral for a JAVA implementation of these transform) and then a good compression algorithm such as JPEG2000. On the other hand, this approach can be a bit complicated to implement and slow due to the KLT/RKLT transforms.
Other simpler approach is to simply use JPEG2000 with the DWT for spectral decorrelation. For instance, if you use the Kakadu implementation (kakadusoftware.com), you just need to use the proper parameters when compressing. Here you have an example invocation extracted from http://kakadusoftware.com/wp-content/uploads/2014/06/Usage_Examples.txt,
Ai) kdu_compress -i catscan.rawl*35@524288 -o catscan.jpx -jpx_layers *
-jpx_space sLUM Creversible=yes Sdims={512,512} Clayers=16
Mcomponents=35 Msigned=no Mprecision=12
Sprecision=12,12,12,12,12,13 Ssigned=no,no,no,no,no,yes
Mvector_size:I4=35 Mvector_coeffs:I4=2048
Mstage_inputs:I25={0,34} Mstage_outputs:I25={0,34}
Mstage_collections:I25={35,35}
Mstage_xforms:I25={DWT,1,4,3,0}
Mnum_stages=1 Mstages=25
-- Compresses a medical volume consisting of 35 slices, each 512x512,
represented in raw little-endian format with 12-bits per sample,
packed into 2 bytes per sample. This example follows example (x)
above, but adds a multi-component transform, which is implemented
using a 3 level DWT, based on the 5/3 reversible kernel (the kernel-id
is 1, which is found in the second field of the `Mstage_xforms' record.
-- To decode the above parameter attributes, note that:
a) There is only one multi-component transform stage, whose instance
index is 25 (this is the I25 suffix found on the descriptive
attributes for this stage). The value 25 is entirely arbitrary. I
picked it to make things interesting. There can, in general, be
any number of transform stages.
b) The single transform stage consists of only one transform block,
defined by the `Mstage_xforms:I25' attribute -- there can be
any number of transform blocks, in general.
c) This block takes 35 input components and produces 35 output
components, as indicated by the `Mstage_collections:I25' attribute.
d) The stage inputs and stage outputs are not permuted in this example;
they are enumerated as 0-34 in each case, as given by the
`Mstage_inputs:I25' and `Mstage_outputs:I25' attributes.
e) The transform block itself is implemented using a DWT, whose kernel
ID is 1 (this is the Part-1 5/3 reversible DWT kernel). Block
outputs are added to the offset vector whose instance index is 4
(as given by `Mvector_size:I4' and `Mvector_coeffs:I4') and the
DWT has 3 levels. The final field in the `Mstage_xforms' record
is set to 0, meaning that the canvas origin for the multi-component
DWT is to be taken as 0.
f) Since a multi-component transform is being used, the precision
and signed/unsigned properties of the final decompressed (or
original compressed) image components are given by `Mprecision'
and `Msigned', while their number is given by `Mcomponents'.
g) The `Sprecision' and `Ssigned' attributes record the precision
and signed/unsigned characteristics of what we call the codestream
components -- i.e., the components which are obtained by block
decoding and spatial inverse wavelet transformation. In this
case, the first 5 are low-pass subband components, at the bottom
of the DWT tree; the next 4 are high-pass subband components
from level 3; then come 9 high-pass components from level 2 of
the DWT; and finally the 17 high-pass components belonging to
the first DWT level. DWT normalization conventions for both
reversible and irreversible multi-component transforms dictate
that all high-pass subbands have a passband gain of 2, while
low-pass subbands have a passband gain of 1. This is why all
but the first 5 `Sprecision' values have an extra bit -- remember
that missing entries in the `Sprecision' and `Ssigned' arrays
are obtained by replicating the last supplied value.