Questions tagged [armv8]

This tag is for questions regarding specifically version 8 of the ARM architecture - 32-bit or 64-bit. Questions about the 64-bit ARM execution state or instruction set should be tagged with [arm64].

Version 8 of the ARM architecture introduced a new 64-bit execution state (AArch64) with a new 64-bit instruction set (A64) as well as retaining and extending the existing 32-bit execution state (AArch32) and its instruction sets A32 ("ARM") and T32 ("Thumb").

245 questions
1
vote
1 answer

operand must be an int register in armv8

I'm working on a program in ARMv8, and when attempting to compile it using gcc, I get the error message "operand 3 must be an integer register" for the following lines of code: 67 mul x2, x2, 8 100 mul x16, x11, 4 117 mul x16, x11, 4 140 mul…
Rose Ben Ann
  • 69
  • 1
  • 1
  • 5
1
vote
1 answer

error: unknown mnemonic in armv8 when compiling with gcc

I am trying to compile a project with multiple c files and an assembly file written in ARMv8, which I have not done before, so I am having some trouble understanding a few error messages I am getting. I consistently receive the "unknown mnemonic"…
Rose Ben Ann
  • 69
  • 1
  • 1
  • 5
1
vote
0 answers

Confusion on ARMv8 PMU events of l3d_cache_refill and ll_cache_miss_rd in Cortex-A78

I am testing process's memory bandwidth in Cortex-A78 Linux with perf. And I got following output. 18,312,265 ll_cache_miss_rd # 0.507 M/sec (28.64%) 36,006,163 l3_cache_refill # …
wangt13
  • 959
  • 7
  • 17
1
vote
0 answers

How to measure Linux process's memory bandwidth on ARMv8 CPUs?

I am running a performance testing on Linux system. I am wondering if there is a way to measure a process's memory bandwidth? Now I am using perf to capture the ll_cache_miss_rd data, multiplied it by cacheline size to evaluate the total memory…
wangt13
  • 959
  • 7
  • 17
1
vote
1 answer

Adding 32 bit register to 64 bit register in ARM

I have code something like this: mov x24, 5 mov w25, 5 add x24, x24, w25 I am getting a "Missing extend operator at operand 3". I know I could just switch both to 64bit operators but I'm wondering if its still possible to add 32bit numbers and…
Fattyffat
  • 13
  • 3
1
vote
1 answer

In ARMv8, where is a process's root page table is saved?

In ARMv8 Linux, TTBR0_EL1 and TTBR1_EL1 are used by MMU to do virtual memory management. So where is the PGD of a process saved in ARMv8 Linux? In X86, CR3 is used to hold the root of a process page table, it is switched during process context…
wangt13
  • 959
  • 7
  • 17
1
vote
2 answers

What does MSL do in the ARM instruction MOVI . #, MSL #amount

I am confused about the operation MSL, which is used in a variant of the MOVI and MVNI instructions. There's not a lot of information out there, but I have seen it referred to as "Masking Shift Left". Could anyone give an example of what a masking…
1
vote
1 answer

ARMv8 Linux Context Switch

I am studying about Linux Context Switch on the ARMv8 Below is the codes ENTRY(cpu_switch_to) mov x10, #THREAD_CPU_CONTEXT add x8, x0, x10 mov x9, sp stp x19, x20, [x8], #16 // store callee-saved registers stp x21, x22, [x8],…
attila
  • 31
  • 3
1
vote
1 answer

How to vectorize 2D array using neon intrinsics

I am trying to add 50 to every element of a 2D array using neon intrinsic, here is my code, Is there any better way of doing it or optimizing it? void fun(int height,int width,unsigned char array2D[][width],unsigned char *output){ uint8x16_t…
R1608
  • 23
  • 4
1
vote
1 answer

Handling elements that are odd number using neon intrinsics

I am new to neon intrinsics. I have two arrays containing 99 elements which I am trying to add them element wise using neon intrinsic. As 99 is not a multiple of 8,16 or 32. 96 elements can be handled how to handle the remaining 3 elements. please…
R1608
  • 23
  • 4
1
vote
0 answers

Arm-v8 PMCCNTR_EL0 returns 0 if read several times without unloading the kernel object

I have a cpu that have multiple A72 cores. I am trying to bench an algorithm and I want to count the number of core cycles that elapsed during the execution of a thread. I've cross-compiled two kernel objects to configure properly the registers in…
JacobB
  • 15
  • 2
1
vote
1 answer

How can I get timer in microsecond in ARMv8 system?

I am writing a part of kernel code in ARMv8 RTOS. I am trying to do a function like gettimeofday() in Linux, which can return system time in second and microsecond. But I failed to do that. ARMv8 support PL031, I think it is working at freq. of 1Hz,…
wangt13
  • 959
  • 7
  • 17
1
vote
0 answers

What's the difference between data section and aux region in perf_event_open()?

I'm studying the Statistical Profiling Extension(SPE) for Armv8 recently. During data sampling and reading there is little documentation and I have a question that I can hardly understand. There is a ring buffer to record sampled data which is…
oleotiger
  • 105
  • 2
  • 11
1
vote
0 answers

how to get the arm instruction disassembly of Global Offset Table (GOT) and plt

I am trying to generate the disassembly of dhrystone. The following commands were used: aarch64-none-linux-gnu-gcc -O0 -mtune=cortex-a77 -mcpu=cortex-a77 --static -c -DHZ=60 -O2 -fno-inline -fno-pie dhry_1.c aarch64-none-linux-gnu-gcc -O0…
Tom Jose
  • 33
  • 4
1
vote
1 answer

is data race safe in ARMV8?

As we know, access aligned fundamental data types in INTEL X86 architecture is atomic. How about ARMV8? I have tried to get the result from Arm Architecture Reference Manual Armv8, for A-profile architecture, I did find something related to…
Hankin
  • 45
  • 4