5

Is it possible for a user program on aarch64 detect whether crc32 instructions are available? I have found references to kernel support for such detection, implying that the registers with the information about what instructions will work in user mode are not available in user mode (!).

Is that the case? Or is there a portable way to determine if the crc32 instructions are available?

Note: What I mean by "user program" and "portable" is an approach that does not require privileged instructions nor operating-system-specific calls or files (e.g. /proc/cpuinfo). The code itself needs to be able to detect if the instructions are available and use them if they are, or fall back to an alternative if they are not. As an example, Intel processors have the cpuid instruction for this purpose.

Update:

Poking around in ARM architecture descriptions, I found a user-level register, PMCR_EL0, which provides an 8-bit implementer code and an 8-bit ID code for the processor. Perhaps if I could find a list of those codes, I might be closer to what I'm looking for.

Update 2:

However, when I try to read that register, I get an illegal instruction exception. So even EL0 registers require privileged access?

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Yes, PMCR_EL0 can be accessed at EL0, but provided EL0 access has previously been enabled by configuring PMUSERENR_EL0. The bizarre thing is that it seems that running at EL1 is required to do so, see http://zhiyisun.github.io/2016/03/02/How-to-Use-Performance-Monitor-Unit-(PMU)-of-64-bit-ARMv8-A-in-Linux.html. I tried at EL0 and got an illegal instruction error. – Frant Jan 04 '19 at 12:14
  • ARM just will not give me a break here. – Mark Adler Jan 04 '19 at 23:08

3 Answers3

4

Not to the best of my knowledge.

The way I've implemented it in Chromium's zlib was using the available OS functionality: https://cs.chromium.org/chromium/src/third_party/zlib/arm_features.c?l=29

It is also relevant to mention that the crc32 instructions on ARMv8 are part of the crypto extensions that are optional on ARMv8 and mandatory on ARMv8-1. It also means that runtime feature detection is necessary, for further details, please check: https://cs.chromium.org/chromium/src/third_party/zlib/BUILD.gn?l=64

I would avoid reading directly from /proc/cpuinfo, as that may not be available in some contexts (as also depending on the Android flavor, it may be a false negative).

In Chromium, zlib will run both in a privileged context (i.e. part of the network code in the main browser process) as also in a sandboxed context (i.e. part of the RendererProcess in a tab). In the RendererProcess, reading from /proc/cpuinfo should fail.

A sledgehammer approach would be to install a signal handler and execute the instruction with inline asm, that would cause a fault if the instruction is not available (and could be captured by the handler). Not recommended, though.

The aforementioned example (https://github.com/torvalds/linux/blob/master/Documentation/arm64/cpu-feature-registers.txt) worked in 1 ARM board I've tested (MachiatoBin) but failed in 2 others (rock64 and nanopi m4).

The approach implemented in Chromium works on all the boards (as also a few cellphones I've tested).

Another detail about getauxval: the correct flag will change if running on 32bits or 64bits. So in 64bits it would be HWCAP_CRC32, while in 32bits it would be HWCAP2_CRC32.

About the sledgehammer approach: Signals are prone to race conditions plus you would still rely on the use of OS specific APIs (i.e. to install the signal handler).

Finally, depending on the context, if a given task crashes (even if by design and isolated from the execution context) it will will trigger red flags.

This is a point (i.e. feature detection) where life is way easier on x86.

That being said, it may be an acceptable compromise to rely on the OS features. We have being shipping the linked code in Chromium since release M66 (current stable is M72), first landed almost one year ago with no ill reports.

One consideration on Android was that internally the NDK may implement android_getCpuFeatures() using a dlopen()/dlsym() and that can add around 500us to 1000us at first startup, which is why we cache the result of the CPU feature detection.

Another consideration for multithreaded apps (like Chromium) was the need for a thread barrier (i.e. pthread_once_t) to avoid race conditions while performing the CPU feature detection.

  • Thank you. What are the downsides of the sledgehammer? – Mark Adler Jan 15 '19 at 22:42
  • I would assume this is the signal-based scheme used by Openssl to detect if an ARM instruction is available or not by installing a signal handler to trap an illegal instruction - see one of my initial comments above. There are therefore no downsides, but for the fact that your requirements would still not be fullfilled: specific support at the libc and operating system level are still required, which is exactly what you wanted to avoid. – Frant Jan 19 '19 at 02:55
1

Update : the original answer did not answer the question, since its author wanted some universal portion of code running at EL0 capable of determining if the CRC32 feature is present or not without any requirements on the operating system or bare-metal environment being used.

My understanding is that such a code would need to access ID_AA64ISAR0_EL1, and because code running at EL0 cannot access it, a switch to a more privileged exception level would be required anyway.

In the same way, trapping an illegal instruction using a 'portable' section of code would required accessing a VBAR_ELx register, which cannot be achieved from a program running at EL0 that would not rely on any underlying operating system/privileged monitor.

Therefore, my answer to question "Is that the case?" would be: Yes, it is, that is a portable/universal section of code running at EL0 cannot determine if the CRC32 feature is available or not.

This being said, the example code provided in the documentation referenced in the question is working fine on an Expressobin running aarch64 linux 4.14.80, and should be preferred to using getauxval() for the very reasons explained in the kernel documentation.

Frant
  • 5,382
  • 1
  • 16
  • 22
  • I linked to that same solution in my question. It is a Linux kernel call, and so, not portable. – Mark Adler Dec 29 '18 at 06:10
  • Interestingly, it's not even portable on Linux. Just tried it for kicks on Debian and Ubuntu ARM64 systems, and both said that `HWCAP2_CRC32` was undefined. – Mark Adler Dec 29 '18 at 06:14
  • I would not agree, this is supposed to be a user-mode solution according to ARM's article, and more specifically, a user-mode program calling getauxval(AT_HWCAP2) does compile/execute, even though AT_HWCAP2 is not supported/present in our asm/hwcap.h files. I am currently investigating the issue. – Frant Dec 29 '18 at 19:42
  • Thank you for the well-assembled information. However this does not meet my needs for portability. Neither `getauxval()`, nor `/proc/cpuinfo` exist outside of Linux systems, and so are not portable. `getauxval()` is a user mode call to the kernel, so that the kernel can access the protected register for the user. Ideally I'm looking for a way that can run entirely within a user-mode program, not requiring the assistance of the operating system. This is entirely possible on Intel CPUs. – Mark Adler Dec 29 '18 at 21:22
  • I see. If access to ID_AA64ISAR0_EL1 is required, an application running at EL0 would likely need to use the underlaying operating system. Another option would be to use a CRC32 instruction after having used the signal() function to trap a call to an illegal instruction. Note that if this may work with Linux (OpenSSL was at one point using this scheme) , it may not work on your other targets in the case they would not offer en equivalent scheme for trapping a call to an illegal instruction. – Frant Dec 29 '18 at 22:10
0

this might not be directly accessible; but ARM would provide specifications for each processor - therefore there is a chance to create a chart, which can be used to look up CPU features by the model name. /proc/cpuinfo is Linux specific; the Windows the equivalent would be WMI; OSX does not run on ARM (as far as I know). unless it would be a type 1 hyper-visor, which bypasses the operating system entirely, there has to be OS specific code (and the user can also disable VT).

Martin Zeitler
  • 1
  • 19
  • 155
  • 216
  • See Mark Adler's note above, I think your answer does not fulfill his requirements either. – Frant Jan 02 '19 at 18:59
  • @Frant because this requirement doesn't make sense, unless bypassing the OS - which may or may not be possible, depending on the setup - which does not really make it portable. the model name and a chart with specs would just be the least intrusive, with the highest chance to tell apart feature support. – Martin Zeitler Jan 02 '19 at 19:41
  • I would not say the requirement does not make sense, I just think it cannot be fulfilled because of the Armv8-a architectural design. What you are proposing would work though. I am just saying this is not what Mark Adler was looking for. – Frant Jan 02 '19 at 20:16
  • @Frant it's not necessarily because of the hardware design, the issue arises as soon as any modern OS had been booted, which protects it's resources and only permits access through drivers. meanwhile one cannot even directly access the memory anymore (unless it runs within an emulation)... there are still a few tools, which can flash BIOS/UEFI directly from within the OS, which these are vendor specific and OS dependent, too. the only thing truly OS independent would be an OS. – Martin Zeitler Jan 02 '19 at 20:42
  • ARM even features 8 levels of hardware protection... which a running OS may utilize. – Martin Zeitler Jan 02 '19 at 20:48
  • What I was meaning is that, by architectural design, in the ARMv8-a architecture, ID_AA64ISAR0_EL1 is accessible only at EL1, EL2, EL3, VBAR_EL1 from EL1,EL2, EL3, VBAR_EL2 from EL2, EL3 and VBAR_EL3 from EL3. That is, a generic program running at the user privilege level, i.e. EL0 on Linux, cannot access those registers without support from an OS or a monitor running at least at EL1. – Frant Jan 02 '19 at 21:01
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186069/discussion-between-frant-and-martin-zeitler). – Frant Jan 02 '19 at 21:15
  • iOS runs on ARM. – Mark Adler Jan 03 '19 at 07:42
  • A requirement for a user program to be able to determine what user instructions are and are not available is eminently reasonable. In fact, that requirement is entirely met on Intel processors with the cpuid instruction. That the ARM architecture designers elected to make that information privileged borders on unconscionable. – Mark Adler Jan 03 '19 at 07:46
  • What I am hoping for is a sneaky way, for example, variations in the behavior of other certainly available user instructions to reveal the architecture revision, and by inference the presence of the crc32 instructions. – Mark Adler Jan 03 '19 at 07:49
  • @MarkAdler on Windows with `cpuid`, this is OS dependent; while whatever OS is running, the only chance is to get the information which the OS provides; with `cat /proc/cpuinfo | grep crc32` (which does not require `sudo`) I can get the information per core (these cores are not always identical; eg. a Tegra CPU has two types of). something alike this library would be an option, concerning the portability: https://github.com/pytorch/cpuinfo/blob/master/include/cpuinfo.h (which merely implements what I've suggested) – Martin Zeitler Jan 03 '19 at 12:25
  • @Mark Adler even though this would still not fulfill your requirement, CRC32 is optional in Armv8.0-A, that is we are back to square one in this case, but is always present in Armv8.1-A and later revisions of the architecture - see http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801g/awi1476352818103.html – Frant Jan 03 '19 at 13:23
  • `cpuid` on Intel is never privileged. It is not OS dependent. – Mark Adler Jan 03 '19 at 16:30
  • I could live with losing the 8.0 boundary case, if I had a way to detect 8.1 or higher. – Mark Adler Jan 03 '19 at 16:31
  • @Mark Adler I hope someone else will be able to contradict me, but I do think ARM decided to prevent all identification registers to be accessed from EL0 - see section "4.2.1 AArch64 identification registers" of http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500j/DDI0500J_cortex_a53_trm.pdf for example. All registers names end with _EL1, including ID_AA64ISAR0_EL1, which means by convention that running at least at EL1 is required in order to access those registers. – Frant Jan 03 '19 at 18:23
  • Indeed they did. – Mark Adler Jan 03 '19 at 18:40
  • Maybe the `PMCR_EL0` register could help. – Mark Adler Jan 03 '19 at 19:28
  • I was not able to identify any event in PMCEID0_EL0, PMCEID1_EL0 that would give a hint on whether or not the CRC32 feature is implemented on a Cortex-A53 for example. – Frant Jan 03 '19 at 20:19