This is highly CPU dependent and there's no surefire way to know this ahead of runtime, especially because you're not asking about a specific CPU...
16-bit arithmetic is generally believed to be inefficient on 64-bit computers, and 32-bit arithmetic should perform faster or just as fast as 64-bit arithmetic, but like I said, your mileage may vary, especially with future CPUs.
If you don't know the target CPU ahead of time and this is very time-sensitive code, you may want to implement it both ways, have your software run a quick benchmark at startup, then use the path that's faster.