From the probabilities, you assign code lengths to symbols. To create the code the receiver needs a list of tuples: (bit count, symbol count), followed by the symbols in the order to be allocated to the code. Now you can play around with how you encode those.
Encoding the list of symbols can use the fact that for every symbol transmitted, the number of bits you need for following symbols goes down. An option to specify early on that some subset of (say) 8-bit symbols is used can help here. As the code words get longer, it may be handy to have an encoding for a run of symbols, rather than transmitting each one -- perhaps with a way to express a run less a few symbols, where the "holes" can be expressed in some number of bits which depends on the length of the run -- or a start symbol, length and bit-vector (noting that the number of bits to express the length depends on the start symbol and the number of symbols left, and there is no need to send a bit for the first and last in the range !)
The encoding of the Huffman code table is a whole game in itself. Then for short messages, the table can be a serious overhead... in which case, a (small) number of commonly useful tables may give a better compression.
You can also mess about with a Huffman encoding for the code length of each symbol, and send those in symbol order. A repeat count mechanism, with its Huffman can help here, and a way of skipping runs of unused symbols (ie symbols with zero code lengths). You can, of course, add a first level table to specify the encoding for this !
Another approach is a number of bit vectors, one vector for each code word length. Starting with the code word length with the most symbols, emit the length and a bit vector, then the next most populous code length with a smaller bit vector... and so on. Again, a way to encode runs and ranges can cut down the number of bits required, and again, as you proceed, the bits required for those goes down.
The question is, how sensitive is the comparison to the size of the code table ? Clearly, if it is very sensitive, then investigating what can be done by the application of cunning is important. But the effectiveness of any given scheme is going to depend on how well it fits "typical" data being compressed.