So as usual with which is faster questions, is what have you tried so far? Did you compile and disassemble and see what happens?
unsigned int mfun ( unsigned int a, unsigned int b, unsigned int c, unsigned int d )
{
if ( a * b * c * d == 0 ) return(7);
else return(11);
}
unsigned int ofun ( unsigned int a, unsigned int b, unsigned int c, unsigned int d )
{
if (a == 0 || b == 0 || c == 0 || d == 0) return(7);
else return(11);
}
for arm one compiler gives this
00000000 <mfun>:
0: e0010190 mul r1, r0, r1
4: e0020291 mul r2, r1, r2
8: e0110293 muls r1, r3, r2
c: 13a0000b movne r0, #11
10: 03a00007 moveq r0, #7
14: e12fff1e bx lr
00000018 <ofun>:
18: e3500000 cmp r0, #0
1c: 13510000 cmpne r1, #0
20: 0a000004 beq 38 <ofun+0x20>
24: e3520000 cmp r2, #0
28: 13530000 cmpne r3, #0
2c: 13a0000b movne r0, #11
30: 03a00007 moveq r0, #7
34: e12fff1e bx lr
38: e3a00007 mov r0, #7
3c: e12fff1e bx lr
so the equals and ors have short circuits (which are themselves costly) but the worst path takes longer so the performance is erratic, the multiply performance is more deterministic and less erratic. By inspection the multiply solution should be faster for the above code.
mips gave me this
00000000 <mfun>:
0: 00a40018 mult a1,a0
4: 00002012 mflo a0
...
10: 00860018 mult a0,a2
14: 00002012 mflo a0
...
20: 00870018 mult a0,a3
24: 00002012 mflo a0
28: 10800003 beqz a0,38 <mfun+0x38>
2c: 00000000 nop
30: 03e00008 jr ra
34: 2402000b li v0,11
38: 03e00008 jr ra
3c: 24020007 li v0,7
00000040 <ofun>:
40: 10800009 beqz a0,68 <ofun+0x28>
44: 00000000 nop
48: 10a00007 beqz a1,68 <ofun+0x28>
4c: 00000000 nop
50: 10c00005 beqz a2,68 <ofun+0x28>
54: 00000000 nop
58: 10e00003 beqz a3,68 <ofun+0x28>
5c: 00000000 nop
60: 03e00008 jr ra
64: 2402000b li v0,11
68: 03e00008 jr ra
6c: 24020007 li v0,7
Unless the branches are too costly the equals and ors looks faster.
Openrisc 32
00000000 <mfun>:
0: e0 64 1b 06 l.mul r3,r4,r3
4: e0 a3 2b 06 l.mul r5,r3,r5
8: e0 c5 33 06 l.mul r6,r5,r6
c: bc 26 00 00 l.sfnei r6,0x0
10: 0c 00 00 04 l.bnf 20 <mfun+0x20>
14: 9d 60 00 0b l.addi r11,r0,0xb
18: 44 00 48 00 l.jr r9
1c: 15 00 00 00 l.nop 0x0
20: 44 00 48 00 l.jr r9
24: 9d 60 00 07 l.addi r11,r0,0x7
00000028 <ofun>:
28: e0 e0 20 02 l.sub r7,r0,r4
2c: e0 87 20 04 l.or r4,r7,r4
30: bd 64 00 00 l.sfgesi r4,0x0
34: 10 00 00 10 l.bf 74 <ofun+0x4c>
38: e0 80 18 02 l.sub r4,r0,r3
3c: e0 64 18 04 l.or r3,r4,r3
40: bd 63 00 00 l.sfgesi r3,0x0
44: 10 00 00 0c l.bf 74 <ofun+0x4c>
48: e0 60 30 02 l.sub r3,r0,r6
4c: e0 c3 30 04 l.or r6,r3,r6
50: bd 66 00 00 l.sfgesi r6,0x0
54: 10 00 00 08 l.bf 74 <ofun+0x4c>
58: e0 60 28 02 l.sub r3,r0,r5
5c: e0 a3 28 04 l.or r5,r3,r5
60: bd 85 00 00 l.sfltsi r5,0x0
64: 0c 00 00 04 l.bnf 74 <ofun+0x4c>
68: 9d 60 00 0b l.addi r11,r0,0xb
6c: 44 00 48 00 l.jr r9
70: 15 00 00 00 l.nop 0x0
74: 44 00 48 00 l.jr r9
78: 9d 60 00 07 l.addi r11,r0,0x7
this depends on the implementation of multiply, if it is one clock then the multiplies have it.
If your hardware doesnt support multiply then you have to make a call to have it simulated
00000000 <mfun>:
0: 0b 12 push r11
2: 0a 12 push r10
4: 09 12 push r9
6: 09 4d mov r13, r9
8: 0b 4c mov r12, r11
a: 0a 4e mov r14, r10
c: 0c 4f mov r15, r12
e: b0 12 00 00 call #0x0000
12: 0a 4e mov r14, r10
14: 0c 49 mov r9, r12
16: b0 12 00 00 call #0x0000
1a: 0a 4e mov r14, r10
1c: 0c 4b mov r11, r12
1e: b0 12 00 00 call #0x0000
22: 0e 93 tst r14
24: 06 24 jz $+14 ;abs 0x32
26: 3f 40 0b 00 mov #11, r15 ;#0x000b
2a: 39 41 pop r9
2c: 3a 41 pop r10
2e: 3b 41 pop r11
30: 30 41 ret
32: 3f 40 07 00 mov #7, r15 ;#0x0007
36: 39 41 pop r9
38: 3a 41 pop r10
3a: 3b 41 pop r11
3c: 30 41 ret
0000003e <ofun>:
3e: 0f 93 tst r15
40: 09 24 jz $+20 ;abs 0x54
42: 0e 93 tst r14
44: 07 24 jz $+16 ;abs 0x54
46: 0d 93 tst r13
48: 05 24 jz $+12 ;abs 0x54
4a: 0c 93 tst r12
4c: 03 24 jz $+8 ;abs 0x54
4e: 3f 40 0b 00 mov #11, r15 ;#0x000b
52: 30 41 ret
54: 3f 40 07 00 mov #7, r15 ;#0x0007
58: 30 41
You would hope that the two are equivalent, and from a pure mathematical sense they should be, to get a result of the multiplies to be zero one operand needs to be zero. problem is this is software for a processor, you can easily overflow on a multiply and have non-zero operands and still get zero so to properly implement the code the multiplies have to happen.
because of the cost of mul and divide in particular you should avoid them as much as possible in your software, your multiply solution in this case for the two solutions to be equivalent would require even more code to detect or prevent the overflow cases that can lead to a false positive. Yes, many processors perform mul in one clock, and divide as well, the reason why you dont see divide, and sometimes dont see mul implemented in the instruction set is because the chip real estate required, the expense is now power, heat, the cost of the part, etc. So mul and divide remain expensive, not limited to these of course but they do create long poles in the tent as to the performance of the part, the clock rate, folks want single clock operation not realizing that one instruction may slow the whole chip down, allowing it to be multi-clock might bring your overall clock rate up. so many things are long poles in the tent, so removing mul might not change performance, it all depends...