How does DB work? Aren't bytes of the assembler output code?

Question

What I've studied so far is that we write a program in assembly language ...pass this program to assembler which generates machine code ...pass the machine code to ROM burner which burns it into the ROM of the microcontroller

Now my question is related to directives like DB, used to define data bytes.

How will this directive not generate any machine code? To place a byte in memory some code will have to be there ...it cannot happen magically

This directive will have to generate something that will later tell the ROM burner that this data byte is to be placed at this address.

Please help I'm all confused

"How will this directive not generate any machine code? To place a byte in memory some code will have to be there ...it cannot happen magically" -- `db` does not generate machine code, unless the data you specify with it happens to be machine code. Like an instruction, the `db` data is written to the output file. — ecm, Aug 08 '20 at 07:30
The db does not generate instructions to write to memory at runtime. It puts bytes into the 'instruction stream' instead. A program is loaded (e.g. by an operating system) into memory at startup. The db data is then loaded, too. Either you put coded instructions with db or make sure that the data generated with db is never executed, e.g. by putting the data to the end of the program or putting a jump instruction before db — Sebastian, Aug 15 '20 at 02:05
The trick is that the CPU can't tell the difference between data and instructions. If your data happens to equal machine code and you `jmp` to a sequence of `db` statements, the CPU will attempt to execute your data if it happens to equal executable code (usually this causes a crash or buggy behavior since not all possible numbers are valid instructions.) For most of my interrupt handlers on embedded hardware, if the vector table is read-only I'll point it to uninitialized RAM and write the CPU's "return interrupt" machine code at that address at startup — puppydrum64, Dec 07 '22 at 18:36

score 4 · Answer 1 · answered Aug 08 '20 at 05:17

You're right, a DB pseudo-instruction is not fundamentally different from other instructions. Both just emit some bytes into the output at the current position.

DB is a convenient way to output bytes that your program doesn't execute as code. We call this "data". You do that by putting the data at an address that execution will never reach (e.g. because you don't jump or fall through to there).

You can use DB to manually encode instruction bytes if you want.

In a von Neumann architecture (stored-program machine where instruction bytes are fetched from the same memory that data load/store instructions access), there is no fundamental difference between code and data. The difference is just where you put it, unlike a pure Harvard architecture where code goes in a different address-space than data.

(In real life, Harvard machines like AVR microcontrollers typically copy some program memory (ROM) to RAM on startup, to initialize read-write static data variables, and even have a "load program memory" instruction so you can have constant lookup tables in ROM. So you would still have some db data near code.)

Related Q&As that mention manually encoding instructions with DB (for x86, but the concept is the same for any ISA):

score 2 · Answer 2 · edited Jan 02 '21 at 00:05

All programs require data:

small constants like 1, can usually be embedded within machine code instructions that use them
large constants sometimes don't fit the machine code form so go into data and are referenced by the code
programs often use string literals for file & path names, prompts, etc..
storage buffers as space to read user input or from a file
global variables, initialized to zero or other
floating point constants go in memory as they are usually to large to fit as an immediate within a machine code instruction.

As mentioned above, in some cases the data can be embedded within machine code instructions, as what are called immediates, a short term for an immediate addressing mode. But in many other cases, constants are done as data that is referenced by the machine code rather than embedded within the machine code — the address of the data is embedded within the machine code (using some addressing mode).

In short, we need to be able to declare data in assembly language just like we need to be able to declare data in all other languages. There must also then be a way for program files to capture that code and its data.

If you label the data then you can use (make a reference to) that label from within your code & data.

Most assemblers will also have a notion of separate code & data sections. A .data directive (or whichever is appropriate for this assembler) will tell the assembler to collect subsequent data declarations together into the data section of the assembler & linker output. Usually in the assembly source code, we can switch back and forth between code and data sections, so as to keep data related to code nearby in the source, but possibly collected separately in the constructed program file according to the way program files are defined.

How does DB work? Aren't bytes of the assembler output code?

2 Answers2