7

I'm reading some C code embedded with a few assembly code. I understand that __asm__ is a statement to run assembly code, but what does __asm__ do in the following code? According to the output (i.e., r = 16), it seems that __asm__ does not effect the variable r. Isn't it?

#include <stdio.h>
static void foo()
{
    static volatile unsigned int r __asm__ ("0x0019");
    r |= 1 << 4;

    printf("foo: %u\n", r);
}

Platform: Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn) on OSX Yosemite

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
ZLW
  • 151
  • 9
  • 2
    What compiler and platform? – interjay Dec 27 '14 at 20:24
  • Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn) on OSX Yosemite – ZLW Dec 27 '14 at 20:28
  • 1
    It might mean that `r` is the value of the memory word of address `0x0019` – Basile Starynkevitch Dec 27 '14 at 20:29
  • 2
    The syntax looks a bit like [gcc's global variable registers](https://gcc.gnu.org/onlinedocs/gcc/Global-Reg-Vars.html), although I can only guess about the meaning of `0x0019`. – NPE Dec 27 '14 at 20:36
  • 1
    @Zilong: Out of interest, what's the context here? Is that user space code? Does it appear *exactly* as shown in your question? To the best of your understanding, what does it do? – NPE Dec 27 '14 at 20:40
  • 1
    From experimenting with this a bit, and looking at the generated assembly, it appears that: (1) `0x0019` can be any alphanumeric string; (2) the code that accesses `r` uses `%rip`-relative addressing. Not sure what to make of this though. – NPE Dec 27 '14 at 20:54
  • 1
    I've updated your question with information about the platform. That kind of information needs to be in the question, not just in a comment. – Keith Thompson Dec 27 '14 at 21:05
  • 1
    @specializt: There's no particular reason to avoid using assembly language on OSX. It's not "hardcore digital electronics". The compiler generates assembly code; `__asm__` just lets you specify the assembly code more precisely. Yes, you can shoot yourself in the foot, but that's true on any platform. – Keith Thompson Dec 27 '14 at 21:06
  • 2
    One approach (if reading the manual doesn't help) is to compile with `-S` and examine the generated assembly code. Try with and without the `__asm__` and compare to see just what effect the `__asm__` has. Where did this code come from? Does the context (which you haven't shared with us) tell you what it's supposed to do, or even whether it does anything? – Keith Thompson Dec 27 '14 at 21:10
  • @NPE: I try to understand how TinyOS works. TinyOS is designed for microcontroller. Basically, one writes a program using a C dialect called NesC. Then TinyOS automatically translates the program into C, which includes those __asm__ instructions. Finally TinyOS compiles the resulting C code into an executable, which runs on a microcontroller such as the Texas Instruments MSP430. Because the memory of a microcontroller is limited, there is no notion such as user space. You can operate on the whole memory. You might be right, I'm reading the docs to see if 0x0019 is a register. – ZLW Dec 27 '14 at 21:13
  • Another possible clue: on my Linux system, `gcc -c` (gcc 4.8.2) complains about a "Missing symbol name in directive". `clang -c` (clang version 3.5) doesn't complain. Wild guess: perhaps there's an error that clang doesn't diagnose? – Keith Thompson Dec 27 '14 at 21:14
  • What happens if you comment out the `__asm__` (or replace it with `r=16;`)? Wait, how is TinyOS the same as Apple OS X and clang? – Elliott Frisch Dec 27 '14 at 21:18
  • Commeting it out gave me the same output (i.e., r=16). I ran TinyOs on Ubuntu. I compiled a NesC program on Ubuntu and got the C file and the final executable. This C file includes those __asm__ which I don't understand. So I created the above sample C code and ran it on Mac. BTW, a special compiler tailored for microcontrollers called msp430-gcc is used to compile the c code into the executable. I'm not sure if this __asm__ declaration is specific to sp430-gcc, or it is general applicable to other compilers. – ZLW Dec 27 '14 at 21:34
  • @KeithThompson Not only "can" you shoot yourself in the foot - you _will_ do so for quite a few times. Operating systems like windows (and OSX for that matter) use a _proprietary_ kernel which secures its own hardware-access in various, secret ways - manipulating the processor at ASM level will guarantee unexpected behaviour and even hardware-damage in very extreme cases. Another thing : "wrong", assembler is and always has been en par with digital electronics, in fact electrical engineers need to learn ASM-dialects during the first few semesters. I recommend doing research on the topic. – specializt Dec 27 '14 at 23:01
  • @specializt: I don't believe that's correct. It's entirely possible to write assembly code that simply manipulates data owned by the program. You *can* do unsafe things in assembly (more easily than in C or higher-level languages) -- but since C is translated to assembly language, anything you can do in C can be done in assembly. The nature of the kernel is irrelevant as long as you don't interact with the kernel. – Keith Thompson Dec 29 '14 at 01:54

1 Answers1

4

Strictly speaking, your "asm" snippet simply loads a constant (0x0019).

Here's a 32-bit example:

#include <stdio.h>
static void foo()
{
    static volatile unsigned int r __asm__ ("0x0019");
    static volatile unsigned int s __asm__ ("0x1122");
    static volatile unsigned int t = 0x3344;
    printf("foo: %u %u %u\n", r, s, t);
}

gcc -O0 -S x.c

cat x.c
        .file   "x.c"
        .data
        .align 4
        .type   t.1781, @object
        .size   t.1781, 4
t.1781:
        .long   13124  # Note: 13124 decimal == 0x3344 hex
        .local  0x1122
        .comm   0x1122,4,4
        .local  0x0019
        .comm   0x0019,4,4
        .section        .rodata
.LC0:
        .string "foo: %u %u %u\n"
        .text
        .type   foo, @function
foo:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $24, %esp
        movl    t.1781, %eax
        movl    0x1122, %edx
        movl    0x0019, %ecx
        movl    %eax, 12(%esp)
        movl    %edx, 8(%esp)
        movl    %ecx, 4(%esp)
        movl    $.LC0, (%esp)
        call    printf
        leave
        ret

PS: The "asm" syntax is applicable to all gcc-based compilers.

PPS: I absolutely encourage you to experiment with assembly anywhere you please: embedded systems, Ubuntu, Mac OSX - whatever pleases you.

Here is an excellent book. It's about Linux, but it's also very largely applicable to your OSX:

Programming from the Ground Up, Jonathan Bartlett

Also:

https://www.hackerschool.com/blog/7-understanding-c-by-learning-assembly

http://fabiensanglard.net/macosxassembly/

PPS: x86 assembly syntax comes in two variants: "Intel" and "ATT" syntax. Gcc uses ATT. The ATT syntax is also applicable for any other architecture supported by GCC (MIPS, PPC, etc etc). I encourage you to start off with ATT syntax ("gcc/gas"), rather than Intel ("nasm").

FoggyDay
  • 11,962
  • 4
  • 34
  • 48
  • Thanks a lot. But I still don't understand why gcc couldn't successfully compile it into executable while clang could. I will read your suggested articles and try to understand the assembly code that clang produced. – ZLW Dec 27 '14 at 22:13
  • Hi - Glad it helped. For whatever it's worth, I actually compiled the snippet above (cut/pasted from your code) on two versions of GCC: 32-bit Centos 5.5 and 64-bit CentOS 6.4. It compiled ("assembled" ;)) fine, but I didn't trying linking or executing either version. – FoggyDay Dec 28 '14 at 01:23