1

Im programming mandelbrot set in assembly using SSE. I use interrupt:

mov ax,0x4F02
mov bx,0x107
int 0x10

to set video mode to 1280x1024 pixels with 256 colors, then I enable A20 gate and switch correctly to 32 bit protected mode and allow fpu and sse in cr0 and cr4. I tested some simple SSE instruction and they worked without exceptions. But then I made mandelbrot set and nothing is rendered when i start it (drawing pixels/lines is working fine). I also tried to step the code with debugger but I didnt find any mistake, could someone please look at my code ? thanks for anyhelp (compilling in nasm, running in freedos)

    mov edx,0xA0000
    xorps xmm7,xmm7
repeat:
    movupd xmm0,xmm7
    movupd xmm5,[centerimage]
    subpd xmm0,xmm5
    movupd xmm5,[zoom]
    divpd xmm0,xmm5
    movupd xmm6,xmm0
    xorps xmm0,xmm0
    xor ecx,ecx
nextiteration:

    movupd xmm1,xmm0
    mulpd xmm1,xmm1
    hsubpd xmm1,xmm1
    movupd xmm2,xmm0
    shufpd xmm2,xmm2,0x1
    mulpd xmm0,xmm2
    haddpd xmm0,xmm0
    movsd xmm0,xmm1

    addpd xmm0,xmm6
    movupd xmm1,xmm0
    mulpd xmm1,xmm1
    haddpd xmm1,xmm1

    ucomisd xmm1,[double4]
    ja getcolor
    inc ecx
    cmp ecx,0xFF
    jb nextiteration

    xor al,al
    jmp drawpixel
getcolor:
    mov al,cl
drawpixel:

    movupd xmm0,xmm7
    movupd xmm5,[double1double1280]
    mulpd xmm0,xmm5
    haddpd xmm0,xmm0
    cvtsd2si edi,xmm0
    mov [edx+edi],al

    movupd xmm5,[double1double0]
    addpd xmm7,xmm5
    ucomisd xmm7,[screenX]
    jb repeat
    movsd xmm7,[double0]

    movupd xmm5,[double0double1]
    addpd xmm7,xmm5
    movupd xmm0,xmm7
    shufpd xmm0,xmm0,1
    ucomisd xmm0,[screenY]
    jb repeat
    ;movhpd xmm7,[double0]

    cli
infloop:
    hlt
    jmp infloop

centerimage dq 640.0,512.0
zoom dq 50.0,50.0
double1double1280 dq 1.0,1280.0
double1double0 dq 1.0,0.0
double0double1 dq 0.0,1.0
screenX dq 1280.0
screenY dq 1024.0
double0 dq 0.0
double4 dq 4.0
Segy
  • 213
  • 2
  • 12
  • 4
    The online help explains how to format your posts. You should have a look at it. You also need to narrow this down. For instance, you say, *nothing is rendered when i start it (drawing pixels/lines is working fine)* which doesn't make sense to me. You're saying nothing is rendered, but saying the drawing is working fine. You should simplify the problem and focus on getting a piece of it working, like the rendering, with simple test cases. Then move on to the actual Mandelbot calculations. – lurker Jun 26 '18 at 14:37
  • I mean that before coding mandelbrot set i tried to draw lines and it worked – Segy Jun 26 '18 at 14:39
  • 3
    Debug the part where you try to draw a pixel. Look at the address you calculated. Is it correct? What about the pixel value you are storing? – Raymond Chen Jun 26 '18 at 14:42
  • 1
    I would recommend using intrinsics instead of going straight to assembly. It's much easier to read and debug and asking for help. Plus it lends itself better to upgrading to AVX2 in the future, if you decide to. – ChipK Jun 26 '18 at 14:55
  • 3
    Use your debugger to verify you actually hit the `getcolor` and that `cl` is nonzero at that point. Otherwise you might just be drawing black pixels. Note that `ecx` counts up to `0xff` but is then never reset so the `jb nextiteration` will no longer fire and you might fall through into the `xor al, al` if the `ja getcolor` doesn't branch earlier. – Jester Jun 26 '18 at 14:58
  • 1
    You said that you went through it with a debugger and found no problem. When debugging, did you go step by step up to the point of writing a pixel? When it hit the step of writing a pixel, was a pixel displayed? If not, then a problem was found during debugging. – lurker Jun 26 '18 at 14:58
  • 1
    Even without the bugs, this looks like a very inefficient implementation. Don't use horizontal operations inside inner loops. Instead, compute *two* pixels in parallel, using one vector of the real parts and another vector of the complex parts. Use compare / mask to stop incrementing a vector counter for the element that already has `|m^2| > 4.0` while the other one isn't done yet. (Your loop condition will involve `movmskpd` / cmp or test / jne`). Having only 8 vector regs sucks, but maybe you can unroll and hide the FP latency by doing another pair in parallel. – Peter Cordes Jun 26 '18 at 16:30
  • You don't need to load `0.0` from memory. Its bit-pattern is all-zeros, so you should create it in a register with `xorps xmm7, xmm7`. You also don't need `double1double0`. Simply `movsd` from a `1.0` which you already have as part of another vector constant. (`movsd` zero-extends when loading; only the reg-reg form is a merge/blend.) But you do need it if you want to use it as a memory source operand, like `addpd xmm7, [double1double0]` (which you should use because it's more efficient than a separate load, unless you can reuse the value in the register.) Use `ALIGN 16`. – Peter Cordes Jun 26 '18 at 22:05
  • @lurker he said exactly, drawing pixels is fine (from another code apparently), this routine above is not working, no need to be so patronizing, also like in your other comments. – Wojciech Kaczmarek Jun 28 '18 at 14:39
  • 1
    @WojciechKaczmarek I do not mean to be patronizing. Maybe a little tongue-in-cheek. Unfortunately, such intent is not clear when writing terse comments on forums like this. My point was that, even though in some other context drawing pixels works fine, it obviously does not work in this context, which means that a debug process must then necessarily come to a point where it is not doing what is expected. The OP's comment that running through with the debugger found no mistake means to me that the debugging process didn't traverse into the problem area evidently. – lurker Jun 28 '18 at 14:53
  • few moths ago I found that mistake: `movsd` clears higher 64 bits so it loops forever [https://x86.puri.sm/html/file_module_x86_id_204.html](https://x86.puri.sm/html/file_module_x86_id_204.html) – Segy Nov 14 '18 at 18:07

0 Answers0