Generally, on x86 hardware, you want to use the x86_64 instruction set when possible. Most importantly, this offers more general-purpose CPU registers, which are a tight resource in the Intel x86 architecture. This is probably enough of a gain to justify the additional memory use.
Additionally, with x86_64, you always have the ability to use the important-for-security NX bit. With 32-bit, you need a special "PAE" kernel or else that feature will have to be emulated in software, giving a performance hit. (So, if you do go 32-bit, make sure to use a PAE-enabled kernel.)
That said, the only real way to answer this question is to benchmark with your specific load.