3

I am attempting to install clickhouse-cityhash with pip on OSX 10.14.5 but it fails with the following (abridged) output:

src/city.cc:396:5: error: use of undeclared identifier '_mm_crc32_u64'
    CHUNK(1, 1); CHUNK(k0, 0);
    ^
...
fatal error: too many errors emitted, stopping now [-ferror-limit=]
  20 errors generated.
  error: command 'cc' failed with exit status 1

I've also tried compiling via CC=gcc and CC=g++ to no avail.

The command that is run on failure is:

cc -fno-strict-aliasing -fno-common \
   -dynamic -g -Os -pipe -fno-common \
   -fno-strict-aliasing -fwrapv -DENABLE_DTRACE \
   -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes \
   -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall \
   -Wstrict-prototypes -DENABLE_DTRACE -arch i386 \
   -arch x86_64 -pipe -Iinclude \
   -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 \
   -c src/city.cc -o build/temp.macosx-10.14-intel-2.7/src/city.o \
   -O3 -msse4.2 -Wno-unused-value -Wno-unused-function

In my attempt to understand the problem, I looked at the source code and I can see four calls to _mm_crc32_u64 that comprise part of the CHUNK preprocessor directive mentioned in the error log:

f = _mm_crc32_u64(f, a);                                    \
g = _mm_crc32_u64(g, b);                                    \
h = _mm_crc32_u64(h, c);                                    \
i = _mm_crc32_u64(i, d);                                    \
j = _mm_crc32_u64(j, e);                                    \

I found a reference to _mm_crc32_u64 in the Intel Intrinsics Guide so my understanding is that it's an Intel Intrinsic Instruction as a C function that's part of the SSE4.2 instruction set.

I figured that my machine does not include the SSE4.2 instruction set, but when I run the following command:

sysctl -a | grep cpu.features

SSE4.2 is included in the list:

machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C

Therefore, should I expect _mm_crc32_u64 to be available, and if so, what is the likely reason for this error?

If not, is there anything I can do to make these instructions available?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Darragh Enright
  • 13,676
  • 7
  • 41
  • 48
  • 2
    What does your compiler (OS X's clang I assume) do with `-arch i386 -arch x86_64`? If it's generating 32-bit code, 64-bit integer registers aren't available. So the widest CRC in 32-bit code would be `_mm_crc32_u32`. `-msse4.2` (or better `-march=native`) should enable CRC instructions, and they should be declared in `#include ` – Peter Cordes Jun 11 '19 at 17:53
  • Thanks Peter! That looks like a great lead. Makes sense when you point it out. I'm not setting these flags so I will look into how that is happening and attempt to override them, when running `pip install` if possible. – Darragh Enright Jun 11 '19 at 18:23
  • 1
    With normal clang (not Apple's version), `-arch i386` doesn't do anything. The default for x86 clang is still to build 64-bit code. https://godbolt.org/z/ej1kaK shows that `-msse4.2` is sufficient to get it to define the intrinsic. (Although `-march=native` would still be a better choice, to set tune options for your CPU and enable other stuff like AVX2, FMA, BMI1/2, and popcnt). – Peter Cordes Jun 11 '19 at 18:50
  • @PeterCordes When you said "With normal clang (not Apple's version), -arch i386 doesn't do anything" the answer was suddenly very obvious—don't use Apple's `clang`. I discovered I had another version of clang installed via brew, and when I used that instead it worked perfectly. Thanks a million! I'll post an answer now, or if you want to post an answer I'll gladly mark that the answer? – Darragh Enright Jun 11 '19 at 21:13
  • I don't have a Mac so I wouldn't know the details of what to put in an answer. Feel free to copy any wording you like from my comments into your answer. It's still odd that the default compiler wouldn't work, though. It is making 64-bit code, right? – Peter Cordes Jun 11 '19 at 21:23

1 Answers1

2

Many thanks to @PeterCordes for his very valuable observations in the question comments above!


The failing build command during pip install clickhouse-cityhash included the -arch i386 flag. The default behaviour of x86 clang is to build 64-bit code despite the presence of this flag.

However, this does not appear to be Apple clang's default behaviour. If 32-bit code is generated then _mm_crc32_u32 would the largest CRC available, implying that _mm_crc32_u64 is not defined.

Therefore, one solution is not to use Apple clang.

Most developers using OSX will be familiar with the brew package manager and have it installed. You may find that you already have a version of gcc installed via brew as a dependency of another package.

Check with the following:

brew list | grep gcc

If not, install it with:

brew install gcc

The executable should be available in your $PATH (usually at /usr/local/bin) as gcc or similar—mine was available as gcc-8.

To use, just define the gcc you want to use with the CC envvar and run pip install; e.g:

CC=gcc-8 pip install clickhouse-cityhash

Hope this helps :)

Darragh Enright
  • 13,676
  • 7
  • 41
  • 48
  • Did you test that compiler command-line without any `-arch` options to see if that's really what's doing it for Apple clang? (Rather than a broken `immintrin.h` or something). e.g. `cd` to where `make` was, then copy/paste the exact command but remove the `-arch` stuff. (And BTW, gcc is not clang. Maybe you misremembered earlier when you said you had *another version of clang installed via brew*. They're both pretty good compilers, but clang unrolls small loops even without profile-guided optimization. But they have totally separate optimizer back-ends, assuming a normal GCC build.) – Peter Cordes Jun 11 '19 at 22:58
  • Thanks for the clarifications—pretty late here so probably be a bit loose with my description. I'll do an edit and more testing in the morning. – Darragh Enright Jun 11 '19 at 23:12
  • 1
    Anyway, this answer does more or less confirm that the source code you're building does properly use `#include ` to get the prototype. Which reminds me; you're compiling C++, not C. In C it's not technically an error to call a function with no declaration, you just get a link error (and a warning about implicit declaration at compile time) if you try to use intrinsics that aren't available (because they look like function names). I fixed the tags on your question. – Peter Cordes Jun 11 '19 at 23:19
  • HI @PeterCordes — thanks for the clarification. I think it's clear this is beyond the scope of my area of knowledge :D I really appreciate the feedback, and please feel free to add or correct anything that you might think would help. I haven't had time to return to looking and clarifying the issue yet, soon hopefully. Thanks again! – Darragh Enright Jun 13 '19 at 15:38
  • 1
    Looks fine to me; "use a different compiler" works. It doesn't tell us why Apple Clang didn't work, but maybe someone else can answer that. Mainline clang accepts `-msse4.2` so IDK what's wrong. – Peter Cordes Jun 13 '19 at 15:42