Portably testing for the POPCNT instruction

Question

I'd like my configure script to detect the availability of the POPCNT instruction across a wide variety of Unix-like systems. At the moment I do these tests:

Look for "popcnt" in /proc/cpuinfo. This works in Linux and Cygwin.
Look for "popcnt" in the output of "sysctl -n machdep.cpu.features". This works in MACOSX and (untested) BSD,
Look for "popcnt" in the output of "isainfo -v -x". This works (untested) in solaris.

The greps are done case-independently. Can you see any problems with these, and do you know of any other tests?

Tests requiring root privilege are no use.

I suppose using something like the [cpuid](http://www.etallen.com/cpuid.html) command-line tool is out of the question :) — rici, Jan 12 '16 at 07:23
Do you generate POPCNT with a C/C++ intrinsic, inside inline assembly block, or from a standalone assembly file? — Marat Dukhan, Jan 12 '16 at 07:55
@rici : The cpuid tool is only for x86 systems as far as I can tell, and is not part of the default tool set for Ubuntu, for example. I don't want to ask people to install anything. — Brendan McKay, Jan 12 '16 at 08:44
@Marat : I use the __builtin_popcount series in gcc and the _mm_popcnt_u32 series in icc. — Brendan McKay, Jan 12 '16 at 08:45

score 1 · Answer 1 · answered Jan 14 '16 at 04:10

So you have code that enables -mpopcnt and uses __builtin_popcount if that will be fast. Otherwise you use something different, because your custom solution beats gcc's implementation?

Keep in mind that host != target in some cases. Build-time CPU detection is not appropriate for making binaries that have to run on other machines. e.g. Linux distros making binaries. Cross-compiling for is also a thing, and is commonly done when targeting an embedded system or an old slow system.

Maybe write a custom C program that returns the result you want.

On x86, you could just use the result of runtime CPU detection: run the CPUID instruction and check if popcnt is supported. It's probably best not to unconditionally run the popcnt instruction, since processes that run an illegal instruction generate a syslog entry on some modern distros (e.g. Ubuntu).

With recent GNU C extensions, the easiest way to do that is: __builtin_cpu_init() and __builtin_cpu_supports("popcnt"), saving you the trouble of manually decoding the CPUID results.

You could then fall back to a micro-benchmark of a __builtin_popcount against your custom macro, and take whichever is faster. That might be useful even on non-x86 architectures where your macros beat gcc's implementation. (e.g. an architecture that always has a popcnt instruction available). Then you'd have to handle the case where you should use __builtin_popcount but not build with -mpopcnt

Portably testing for the POPCNT instruction

1 Answers1