Assembly what are data, .main and arr?

Question

Today I learnt Assembly language and many instructions like J JAL BNE and so on... But all of the sudden I saw the following:

data

arr: .word 2, 4, 6, 8

n: .word 9

.text

main:   add     t0, x0, x0

             addi    t1, x0, 1

             la      t3, n

             lw      t3, 0(t3)

fib:       beq     t3, x0, finish

             add     t2, t1, t0

             mv      t0, t1

             mv      t1, t2

             addi    t3, t3, -1

             j       fib

May someone kindly example what are those data, arr, n and .text? are those part of assembly languge? and why we need main? this isn't C language

Your assembly tutorial should explain this in a later chapter. For now, don't worry too much about that. — fuz, Nov 16 '20 at 15:33
everything in the file is part of assembly language. note that assembly language is specific to a tool not a target (the specific assembler, gnu assembler for example, not the target risc-v/rv32) — old_timer, Nov 16 '20 at 15:39
@fuz the tutorial end here, its goal is to teach the basics, can you give me general idea? — , Nov 16 '20 at 15:45
These things like `arr:` and `n:` are *labels*. These give names to locations in your program. `.data` and `.text` are directives that switch between the *data* and *text sections*. If your tutorial ends there, it is quite deficient as this is very important foundational knowledge. Consider following a different one or buy a book on assembly programming. If you don't find anything on RISC-V, it may be a good idea to learn a different architecture and later switch to RISC-V. — fuz, Nov 16 '20 at 15:51
Does this answer your question? [Assembler has:](https://stackoverflow.com/a/64472323/471129) — Erik Eidt, Nov 16 '20 at 15:56

score 1 · Accepted Answer · edited Nov 17 '20 at 08:50

Yes, all of it is assembly language. Note that assembly language is defined by the tool not the target. So gnu assembler (gas) for risc-v may vary from some other assembler for risc-v. Most of the differences will be in the other stuff, stuff other than the instructions, but there will be times where the instructions change as well from one tool to another. The only thing cast in stone is the machine code and if the assembler can generate proper instructions the assembly language could easily look like this

add banana, orange

But anyway.

unsigned int fun ( unsigned int a, unsigned int b )
{
    return(a+b+7);
}

The gcc compiler I have at the moment for this target is generating this from that code.

    .file   "so.c"
    .option nopic
    .attribute arch, "rv32i2p0_m2p0_a2p0_f2p0_d2p0_c2p0"
    .attribute unaligned_access, 0
    .attribute stack_align, 16
    .text
    .align  1
    .globl  fun
    .type   fun, @function
fun:
    addi    a1,a1,7
    add a0,a1,a0
    ret
    .size   fun, .-fun
    .ident  "GCC: (GNU) 10.2.0"

Some of these things are somewhat obvious, some not so much, very little of it is actually necessary to make a useful object for this target and tool

    .globl  fun
fun:
    addi    a1,a1,7
    add a0,a1,a0
    ret

fun: is simply a label, which means an address. The linker when it puts the objects together to make a usable program, sorts out the labels and turns them into addresses fixed or relative as needed. This is a labor saving device. Otherwise

00000000 <skip-0x8>:
   0:   c501                    beqz    x10,8 <skip>
   2:   0001                    nop
   4:   0001                    nop
   6:   0001                    nop

00000008 <skip>:
   8:   00c58533            add x10,x11,x12


00000000 <skip-0xc>:
   0:   c511                    beqz    x10,c <skip>
   2:   0001                    nop
   4:   0001                    nop
   6:   0001                    nop
   8:   0001                    nop
   a:   0001                    nop

0000000c <skip>:
   c:   00c58533            add x10,x11,x12

The encoding of the instruction which you can look up yourself, essentially includes the distance to the destination, without labels we would have to count the number of instructions our self and somehow in the assembly language indicate jump forward 3 half words, jump backward 10 half words.

Gnu assembler has a syntax for this.

nop
nop
nop
beqz x10,.+4
nop
nop
nop
nop
nop
nop


00000000 <.text>:
   0:   0001                    nop
   2:   0001                    nop
   4:   0001                    nop
   6:   c111                    beqz    x10,a <.text+0xa>
   8:   0001                    nop
   a:   0001                    nop
   c:   0001                    nop
   e:   0001                    nop
  10:   0001                    nop
  12:   0001                    nop

But you really do not normally want to have to do that as if you change the number of instructions/bytes between the branch instruction and the destination you have to keep adjusting some/many of your offsets.

So labels are addresses, in gnu assembler they end in a colon and do not use reserved words. Some assembly languages do not use the colon.

Segments:

int mybss;
int mydata=5;
int text ( void )
{
    mybss=3;
    return(++mydata);
}


Disassembly of section .text:

00000000 <text>:
   0:   000007b7            lui x15,0x0
   4:   0007a503            lw  x10,0(x15) # 0 <text>
   8:   00000737            lui x14,0x0
   c:   468d                    li  x13,3
   e:   0505                    addi    x10,x10,1
  10:   00d72023            sw  x13,0(x14) # 0 <text>
  14:   00a7a023            sw  x10,0(x15)
  18:   8082                    ret

Disassembly of section .sbss:

00000000 <mybss>:
   0:   0000                    unimp
    ...

Disassembly of section .sdata:

00000000 <mydata>:
   0:   0005                    c.nop   1
    ...

Hmm that is interesting, that is gnu gcc created. Anyway. Traditionally for whatever reason you can google. The code (instructions basically) are in a segment called .text. The pre-initialized data is .data and the uninitialized data is .bss. Well gnu uses the dot in front of the name, text, data, bss.

Gnu assembler has some shortcuts

.text
nop
.data
.word 1,2,3

But the full syntax would be

.section .text

.section .data

And you can make up whatever you want, I assume if it is not reserved in some way:

.section .hello
    nop
    add x11,x12,x13
    j .
.word 0xA,0xBBBB

.section .world

mystuff: .word 1,2,3,4

Disassembly of section .hello:

00000000 <.hello>:
   0:   0001                    nop
   2:   00d605b3            add x11,x12,x13
   6:   a001                    j   6 <.hello+0x6>
   8:   000a                    c.slli  x0,0x2
   a:   0000                    unimp
   c:   0000bbbb            0xbbbb

Disassembly of section .world:

00000000 <mystuff>:
   0:   0001                    nop
   2:   0000                    unimp
   4:   0002                    c.slli64    x0
   6:   0000                    unimp
   8:   00000003            lb  x0,0(x0) # 0 <mystuff>
   c:   0004                    0x4
    ...

These are object dumps, notice how each segment has its own chunk of data, for this output starting at offset zero. Also note this is the disassembler so it is trying to disassemble data as instructions which is confusing.

The idea here is that you isolate these different data/information types so that you can control where they go. For example in a microcontroller you may have flash at one address space 0x00000000 and you may have sram at another 0x20000000 so you want to isolate the read only code and read only data from the read/write data so that you can tell the linker where to put things.

int mybss;
int mydata=5;
const int myrodata = 25;
int text ( void )
{
    mybss=3;
    return(++mydata);
}

You connect the dots to the linker either on the linker command line or using a linker script which is specific to the tool, the linker, and not generic to a target. Like assembly language there is no expectation that the code will port verbatim to other toolchains.

MEMORY
{
    bob : ORIGIN = 0x00000000, LENGTH = 0x1000
    ted : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)       } > bob
    .rodata : { *(.srodata*)    } > bob
    .data   : { *(.sdata*)      } > ted
    .bss    : { *(.sbss*)       } > ted
}

Obviously not an actual usable binary but the tools do not know that they simply did what I asked

Disassembly of section .text:

00000000 <text>:
   0:   200007b7            lui x15,0x20000
   4:   0007a503            lw  x10,0(x15) # 20000000 <mydata>
   8:   20000737            lui x14,0x20000
   c:   468d                    li  x13,3
   e:   0505                    addi    x10,x10,1
  10:   00d72223            sw  x13,4(x14) # 20000004 <mybss>
  14:   00a7a023            sw  x10,0(x15)
  18:   8082                    ret

Disassembly of section .rodata:

0000001c <myrodata>:
  1c:   0019                    c.nop   6
    ...

Disassembly of section .data:

20000000 <mydata>:
20000000:   0005
    ...

Disassembly of section .bss:

20000004 <mybss>:
20000004:   0000
    ...

where I have bob and ted most folks will put rom and ram or other more useful names. On the left where I have .text and .data and such you can make stuff up there too, needs to match with tools that will read this binary and look for certain key words that that outer tool wants to see. But with gnu linker you can make those up. The names in the middle though need to match the names in objects, so I do not know why it is .sdata instead of .data, that is a new one for me today, but either way I simply look at the object and match the things in the linker script and then control where they go.

Lastly main: again a label, you have seen that compiled functions the function name is basically an address which is represented as a label. An entry point to that function/subroutine. Your C programs do not enter at main(), there is bootstrap code that runs before main in the binary that runs then that code calls main. When you use gcc hello_world.c -o hello_world there is pre-processing, compiling to assembly, assembling to an object and linking with a default linker and C library bootstrap to make a target (operating system) specific binary so that you can then run it ./hello_world.

The code you came across may have been intended (even though what you posted won't work) to be linked and run in such a way. Or the author simply is used to having the word main as must of us are that write C programs. Even with compiled to assembly language C code, as far as the tools are concerned it is just another label. The bootstrap for C will specifically make an external call to it so when everything is linked there needs to be a main() in one of the objects, but you can build binaries without a function called main() and have it work if you master the tools. At least with gnu. Other tools may require that function name for some reason.

Please take care with your spellings here, especially words containing apostrophes. Here are some correct spellings, plus the number of mispellings in your post history: don't (765), doesn't (487), can't (304), won't (217), let's (161), isn't (147). There is some tolerance here for people who don't have English as a first language, but stylistic and deliberate misspelling works against the aims of the site, and is a great deal of work being deliberately made for volunteer editors. Please use the spell-checker feature in your browser. — halfer, Nov 17 '20 at 08:49
@halfer there was one wont in there, I have been checking, I can stop checking and go back to the old way if you prefer. You do not have to edit these answers, I fear you may be changing the meaning of certain statements. I will have to go back and review everything. It sometimes takes hours to write these answers...to get the exact meaning down. If this is how life is going to be going forward I would rather just delete them. — old_timer, Nov 17 '20 at 09:09
The general Stack Overflow reader would recognise that you offer a good deal of value in your posts, and on balance would rather you just acceded to a reasonable request. You gave a perfectly good solution in another post, and pinged me to show it: don't use contractions at all. It's a bit weird not wanting to spell contractions correctly, but I can go with the solution. — halfer, Nov 17 '20 at 09:16
I am quite obviously not changing the meaning of anything - that is a spurious allegation. I suggest you post on _Meta Stack Overflow_ to ask whether it is OK to misspell things for stylistic reasons (there is surely no point in my writing such a post, since I think I already know what the answer will be). — halfer, Nov 17 '20 at 09:17
As pointed out elsewhere getting rid of the apostrophe has been a subject of topic for a very very long time. That is how I was taught and very much prefer it, looks better, reads better, etc. Other misspellings are not intentional. I have had folks mess up my technical documents in the past by trying to correct grammar, etc. Been going through my answers this week and looking for and fixing the donts and wonts, etc...You caught me miss one, but not the many others I fixed to your satisfaction. — old_timer, Nov 17 '20 at 14:53
The language has been constantly evolving, upstairs, watchdog, thru, cuz, ur, and will continue. The apostrophe will not only not be missed, but will be a joy to be rid of...possessives well thats another story. — old_timer, Nov 17 '20 at 14:56
I am open to conversation on the evolution of language, and I find those debates interesting. However, I am not sure that txtspk counts as evolution, and such chat-room formulations will usually be corrected with haste in any format that leans towards technical writing. I've not ever heard that the apostrophe has been seriously questioned, and as a writer and editor I think I would have heard about that, if serious parties had proposed it. Do you have any sources for that? I would be genuinely intrigued to learn more. — halfer, Nov 17 '20 at 16:04
If you have made corrections on old posts, then thank you - I appreciate that. I agree that it is possible to make errors when correcting - and I apologise in advance if I do that. I have pretty good eyes for an error, but I'm fallible, and will endeavour to make corrections to my corrections quickly if they are raised. — halfer, Nov 17 '20 at 16:05
(Ah, I [found this](https://en.wikipedia.org/wiki/Apostrophe#Criticism), which perhaps could be expected to represent the best criticisms of contractions in English. However, while those thoughts are interesting, they do not amount to policy, and no editor worth their salt is going to misspell things in the meantime in order to help institute them). — halfer, Nov 17 '20 at 16:13

Assembly what are data, .main and arr?

1 Answers1