2

How to make arrays of string in assembler and work with them?

I try:

arrayOfWords BYTE "BICYCLE", "CANOE", "SCATEBOARD", "OFFSIDE", "TENNIS"

and after I want to print second word, but its dont work

    mov edx, offset arrayOfWords[2]
    call WriteString

but He print me all world.

SakaSerbia
  • 117
  • 1
  • 13

2 Answers2

2
arrayOfWords BYTE "BICYCLE", "CANOE", "SCATEBOARD", "OFFSIDE", "TENNIS"

is just another way to write

arrayOfWords BYTE "BICYCLECANOESCATEBOARDOFFSIDETENNIS"

and this is far from being an array.
Furthermore mov edx, offset arrayOfWords[2] is not an array indexing.
Brackets in assembly are used to denote an addressing mode, not array indexing.
That's why I can't stop stressing out to NOT1 use the syntax <symbol>[<displacement>] (your arrayOfWords[2]) - it is a very silly and confusing way to write [<symbol> + <displacement>] (in your case [arrayOfWords + 2]).

You can see that mov edx, OFFSET [arrayOfWords + 2] (that in my opinion is clearer written as mov edx, OFFSET arrayOfWords + 2 since the instruction is not accessing any memory) is just loading edx with the address of the C character in BICYCLE (the third char of the big string).

MASM has a lot of high-level machinery that I never bothered learning, but after a quick glance at the manual linked in the footnotes, it seems that it has no high-level support for arrays.
That's a good thing, we can use a cleaner assembly.

An array of strings is not a continuous block of strings, it is a continuous block of pointers to strings.
The strings can be anywhere.

arrayOfWords  DWORD  OFFSET strBicycle, 
                     OFFSET strCanoe,
                     OFFSET strSkateboard,
                     OFFSET strOffside,
                     OFFSET strTennis

strBicycle    BYTE "BICYCLE",0
strCanoe      BYTE "CANOE", 0
strSkateboard BYTE "SKATEBOARD", 0
strOffside    BYTE "OFFSIDE", 0
strTennis     BYTE "TENNIS", 0

Remember: the nice feature of arrays is constant access time; if the strings were to be put all together we'd get a more compact data structure but no constant access time since there'd be no way to know where a string starts but by scanning the whole thing.
With pointers we have constant access time, in general, we require all the elements of an array to be homogeneous, like the pointers are.

To load the address of the i-th2 string in the array we simply read the i-th pointer.
Suppose i is in ecx then

mov edx, DWORD PTR [arrayOfWords + ecx*4]
call writeString

since each pointer is four bytes.

If you want to read the byte j of the string i then, supposing j is in ebx and i in ecx:

mov esi, DWORD PTR [arrayOfWords + ecx*4]
mov al, BYTE PTR [esi + ebx]

The registers used are arbitrary.


1 Despite what Microsoft writes in its MASM 6.1 manual:

Referencing Arrays
Each element in an array is referenced with an index number, beginning with zero. The array index appears in brackets after the array name, as in

array[9]

Assembly-language indexes differ from indexes in high-level languages, where the index number always corresponds to the element’s position. In C, for example, array[9] references the array’s tenth element, regardless of whether each element is 1 byte or 8 bytes in size. In assembly language, an element’s index refers to the number of bytes between the element and the start of the array.

2 Counting from zero.

Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • Thank you a lot. This is best explane what I read.. :) New I would try to solve my problem like that. I am making hangman game for my university project. Its 30% of my mark. – SakaSerbia Jun 03 '17 at 10:15
1

arrayOfWords is not an array, not even a variable. It's just a label that tells the assembler where it can find something, in this case a bunch of characters. Irvine's WriteString expects a null-terminated bunch of characters as string. There are two methods to treat that bunch of characters as string array.

  1. Search the memory for the right address to the desired string. At every null begins a new string.

    INCLUDE Irvine32.inc
    
    .DATA
    manyWords BYTE "BICYCLE", 0
        BYTE "CANOE", 0
        BYTE "SCATEBOARD", 0
        BYTE "OFFSIDE", 0
        BYTE "TENNIS", 0
        BYTE 0                              ; End of list
    len equ $ - manyWords
    
    .CODE
    main PROC
    
        mov edx, 2                          ; Index
        call find_str                       ; Returns EDI = pointer to string
    
        mov edx, edi
        call WriteString                    ; Irvine32: Write astring pointed to by EDX
    
        exit                                ; Irvine32: ExitProcess
    main ENDP
    
    find_str PROC                           ; ARG: EDX = index
    
        lea edi, manyWords                  ; Address of string list
    
        mov ecx, len                        ; Maximal number of bytes to scan
        xor al, al                          ; Scan for 0
    
        @@:
        sub edx, 1
        jc done                             ; No index left to scan = string found
        repne scasb                         ; Scan for AL
        jmp @B                              ; Next string
    
        done:
        ret
    find_str ENDP                           ; RESULT: EDI pointer to string[edx]
    
    END main
    
  2. Build an array of pointers to the strings:

    INCLUDE Irvine32.inc
    
    .DATA
    wrd0 BYTE "BICYCLE", 0
    wrd1 BYTE "CANOE", 0
    wrd2 BYTE "SCATEBOARD", 0
    wrd3 BYTE "OFFSIDE", 0
    wrd4 BYTE "TENNIS", 0
    
    pointers DWORD OFFSET wrd0, OFFSET wrd1, OFFSET wrd2, OFFSET wrd3, OFFSET wrd4
    
    .CODE
    main PROC
    
        mov ecx, 2                          ; Index
        lea edx, [pointers + ecx * 4]       ; Address of pointers[index]
        mov edx, [edx]                      ; Address of string
        call WriteString
    
        exit                                ; Irvine32: ExitProcess
    main ENDP
    
    END main
    

BTW: As in other languages, the index starts at 0. The second string would be index = 1, the third index = 2.

rkhb
  • 14,159
  • 7
  • 32
  • 60
  • `lea edi, manyWords` --> how would I do this in nasm or fasm? – Jodimoro Jun 04 '17 at 07:10
  • @Jodimoro: NASM: `lea edi, [manyWords]` or `mov edi, manyWords`..I guess in FASM it is identical. I'm unsure wether and how `irvine32.lib` works in NASM or FASM. It doesn't work in Linux anyway. If you ask a new question, I'll work on it ;-) – rkhb Jun 04 '17 at 07:34
  • @Jodimoro: Why not what? I don't understand the question. – rkhb Jun 04 '17 at 10:23
  • @Jodimoro: After the operation, `EDI` should contain the address of `manyWords`. `mov edi, [manyWords]` would load the value, not the address. `mov edi, manyWords` loads the address. Casually speaking, `lea edi, [manyWords]` calculates the address of the value [manyWords]. MASM is different manyWords without brackets is the value and "OFFSET manyWords" expresses the address. I prefer `LEA` because it can calculate the address of local variables at runtime. But this isn't relevant in this case. – rkhb Jun 04 '17 at 11:33