If you would like to get the value on the top of the stack into HL, then you already listed pretty much every available option:
pop hl : push hl ; 10+11 = 21t
or
pop hl : dec sp : dec sp ; 10+6+6 = 22t
You can also do self-modified code, which won't be of much benefit:
ld (addr1+1),sp
addr1: ld hl,(0) ; 20 + 16 = 36t
Even more awkwardly, you can get SP into HL first:
ld hl,0 : add hl,sp
ld a,(hl) : inc hl : ld h,(hl) : ld l,a ; 10+11 + 7+6+7+4 = 21+24 = 45t
(I mentioned the last two options just in case that in your situation you can somehow benefit from one of them.)
The command
ex (sp),hl ; 19t
exchanges the value at the top of the stack with the current value of HL, so it is already not exactly what you asked for (even though it is fast). So, the only way to speed this up further would be to cheat in various ways. There are the ways to cheat that seem the most obvious:
If you actually know where SP is pointing at, simply reading it would be quicker:
ld hl,(wherethestackis) ; 16t
If you can control where the stack is exactly, or what the stack actually contains, you can point it onto the command that loads the value into HL, so that you can simply do
ld hl,thevalueonthestack ; 10t
I understand that both of these options seem extreme, but I know a lot of highly optimized Z80 codes that benefit from similar tricks. So, please do not dismiss them straightaway.