Yes, all of it is assembly language. Note that assembly language is defined by the tool not the target. So gnu assembler (gas) for risc-v may vary from some other assembler for risc-v. Most of the differences will be in the other stuff, stuff other than the instructions, but there will be times where the instructions change as well from one tool to another. The only thing cast in stone is the machine code and if the assembler can generate proper instructions the assembly language could easily look like this
add banana, orange
But anyway.
unsigned int fun ( unsigned int a, unsigned int b )
{
return(a+b+7);
}
The gcc compiler I have at the moment for this target is generating this from that code.
.file "so.c"
.option nopic
.attribute arch, "rv32i2p0_m2p0_a2p0_f2p0_d2p0_c2p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.align 1
.globl fun
.type fun, @function
fun:
addi a1,a1,7
add a0,a1,a0
ret
.size fun, .-fun
.ident "GCC: (GNU) 10.2.0"
Some of these things are somewhat obvious, some not so much, very little of it is actually necessary to make a useful object for this target and tool
.globl fun
fun:
addi a1,a1,7
add a0,a1,a0
ret
fun: is simply a label, which means an address. The linker when it puts the objects together to make a usable program, sorts out the labels and turns them into addresses fixed or relative as needed. This is a labor saving device. Otherwise
00000000 <skip-0x8>:
0: c501 beqz x10,8 <skip>
2: 0001 nop
4: 0001 nop
6: 0001 nop
00000008 <skip>:
8: 00c58533 add x10,x11,x12
00000000 <skip-0xc>:
0: c511 beqz x10,c <skip>
2: 0001 nop
4: 0001 nop
6: 0001 nop
8: 0001 nop
a: 0001 nop
0000000c <skip>:
c: 00c58533 add x10,x11,x12
The encoding of the instruction which you can look up yourself, essentially includes the distance to the destination, without labels we would have to count the number of instructions our self and somehow in the assembly language indicate jump forward 3 half words, jump backward 10 half words.
Gnu assembler has a syntax for this.
nop
nop
nop
beqz x10,.+4
nop
nop
nop
nop
nop
nop
00000000 <.text>:
0: 0001 nop
2: 0001 nop
4: 0001 nop
6: c111 beqz x10,a <.text+0xa>
8: 0001 nop
a: 0001 nop
c: 0001 nop
e: 0001 nop
10: 0001 nop
12: 0001 nop
But you really do not normally want to have to do that as if you change the number of instructions/bytes between the branch instruction and the destination you have to keep adjusting some/many of your offsets.
So labels are addresses, in gnu assembler they end in a colon and do not use reserved words. Some assembly languages do not use the colon.
Segments:
int mybss;
int mydata=5;
int text ( void )
{
mybss=3;
return(++mydata);
}
Disassembly of section .text:
00000000 <text>:
0: 000007b7 lui x15,0x0
4: 0007a503 lw x10,0(x15) # 0 <text>
8: 00000737 lui x14,0x0
c: 468d li x13,3
e: 0505 addi x10,x10,1
10: 00d72023 sw x13,0(x14) # 0 <text>
14: 00a7a023 sw x10,0(x15)
18: 8082 ret
Disassembly of section .sbss:
00000000 <mybss>:
0: 0000 unimp
...
Disassembly of section .sdata:
00000000 <mydata>:
0: 0005 c.nop 1
...
Hmm that is interesting, that is gnu gcc created. Anyway. Traditionally for whatever reason you can google. The code (instructions basically) are in a segment called .text. The pre-initialized data is .data and the uninitialized data is .bss. Well gnu uses the dot in front of the name, text, data, bss.
Gnu assembler has some shortcuts
.text
nop
.data
.word 1,2,3
But the full syntax would be
.section .text
.section .data
And you can make up whatever you want, I assume if it is not reserved in some way:
.section .hello
nop
add x11,x12,x13
j .
.word 0xA,0xBBBB
.section .world
mystuff: .word 1,2,3,4
Disassembly of section .hello:
00000000 <.hello>:
0: 0001 nop
2: 00d605b3 add x11,x12,x13
6: a001 j 6 <.hello+0x6>
8: 000a c.slli x0,0x2
a: 0000 unimp
c: 0000bbbb 0xbbbb
Disassembly of section .world:
00000000 <mystuff>:
0: 0001 nop
2: 0000 unimp
4: 0002 c.slli64 x0
6: 0000 unimp
8: 00000003 lb x0,0(x0) # 0 <mystuff>
c: 0004 0x4
...
These are object dumps, notice how each segment has its own chunk of data, for this output starting at offset zero. Also note this is the disassembler so it is trying to disassemble data as instructions which is confusing.
The idea here is that you isolate these different data/information types so that you can control where they go. For example in a microcontroller you may have flash at one address space 0x00000000 and you may have sram at another 0x20000000 so you want to isolate the read only code and read only data from the read/write data so that you can tell the linker where to put things.
int mybss;
int mydata=5;
const int myrodata = 25;
int text ( void )
{
mybss=3;
return(++mydata);
}
You connect the dots to the linker either on the linker command line or using a linker script which is specific to the tool, the linker, and not generic to a target. Like assembly language there is no expectation that the code will port verbatim to other toolchains.
MEMORY
{
bob : ORIGIN = 0x00000000, LENGTH = 0x1000
ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > bob
.rodata : { *(.srodata*) } > bob
.data : { *(.sdata*) } > ted
.bss : { *(.sbss*) } > ted
}
Obviously not an actual usable binary but the tools do not know that they simply did what I asked
Disassembly of section .text:
00000000 <text>:
0: 200007b7 lui x15,0x20000
4: 0007a503 lw x10,0(x15) # 20000000 <mydata>
8: 20000737 lui x14,0x20000
c: 468d li x13,3
e: 0505 addi x10,x10,1
10: 00d72223 sw x13,4(x14) # 20000004 <mybss>
14: 00a7a023 sw x10,0(x15)
18: 8082 ret
Disassembly of section .rodata:
0000001c <myrodata>:
1c: 0019 c.nop 6
...
Disassembly of section .data:
20000000 <mydata>:
20000000: 0005
...
Disassembly of section .bss:
20000004 <mybss>:
20000004: 0000
...
where I have bob and ted most folks will put rom and ram or other more useful names. On the left where I have .text and .data and such you can make stuff up there too, needs to match with tools that will read this binary and look for certain key words that that outer tool wants to see. But with gnu linker you can make those up. The names in the middle though need to match the names in objects, so I do not know why it is .sdata instead of .data, that is a new one for me today, but either way I simply look at the object and match the things in the linker script and then control where they go.
Lastly main: again a label, you have seen that compiled functions the function name is basically an address which is represented as a label. An entry point to that function/subroutine. Your C programs do not enter at main(), there is bootstrap code that runs before main in the binary that runs then that code calls main. When you use gcc hello_world.c -o hello_world
there is pre-processing, compiling to assembly, assembling to an object and linking with a default linker and C library bootstrap to make a target (operating system) specific binary so that you can then run it ./hello_world
.
The code you came across may have been intended (even though what you posted won't work) to be linked and run in such a way. Or the author simply is used to having the word main as must of us are that write C programs. Even with compiled to assembly language C code, as far as the tools are concerned it is just another label. The bootstrap for C will specifically make an external call to it so when everything is linked there needs to be a main() in one of the objects, but you can build binaries without a function called main() and have it work if you master the tools. At least with gnu. Other tools may require that function name for some reason.