2

So I've been working on a Chip-8 emulator as a final project for my CompSci class, and have encountered an issue that seems to extend beyond my code. A large amount of the demos I've downloaded (and I'm sure they're genuine Chip-8 programs, not SuperChip or anything like that) contain machine instructions that don't fit the format of any Chip-8 opcode.

http://mattmik.com/files/chip8/mastering/chip8.html

At the bottom of the page is a list of all the opcodes, each 2 bytes long, and what each nibble of data in them represents. However, a fair number of the programs have instructions that don't meet any the format for any instruction. For example, here's a hex dump from one of them - I'll point out a few individual cases within it

0000000 6a 00 6b 04 6c 01 6d 00 6e 02 23 26 23 20 60 30
0000010 61 01 f0 15 f0 07 f1 18 30 00 12 14 22 42 23 20
0000020 7d 01 23 20 60 08 e0 a1 23 0a 4a 00 12 3e a3 62
0000030 d8 91 79 01 d8 91 4f 01 12 f4 49 18 12 e4 22 b2
0000040 12 1e 4c 01 22 6c 4c 02 22 7a 4c 03 22 88 4c 04
0000050 22 96 4c 05 22 a4 a3 59 d6 72 44 00 00 ee a3 57
0000060 d4 52 42 00 00 ee a3 5b d2 32 00 ee 66 28 67 09
0000070 64 00 65 00 62 00 63 00 00 ee 66 28 67 0e 64 28
0000080 65 14 62 00 63 00 00 ee 66 28 67 07 64 28 65 0c
0000090 62 16 63 11 00 ee 66 28 67 07 64 28 65 0e 62 16
00000a0 63 14 00 ee 66 28 67 05 64 28 65 10 62 16 63 0b
00000b0 00 ee a3 59 d6 72 76 fe d6 72 44 00 00 ee a3 57
00000c0 d4 52 74 02 44 44 74 c0 d4 52 42 00 00 ee a3 5b
00000d0 d2 32 72 02 4c 04 72 02 4c 05 72 02 42 44 72 c0
00000e0 d2 32 00 ee 7c 01 6d 00 6e 02 00 e0 4c 06 6c 01
00000f0 6a 00 12 0a 60 06 f0 18 7b ff 4b 00 13 08 6d 00
0000100 6e 02 00 e0 6a 00 12 0a 13 08 4a 01 00 ee 60 02
0000110 f0 18 6a 01 88 d0 78 01 89 e0 79 01 d8 91 00 ee
0000120 a3 54 dd e2 00 ee 64 19 63 00 a3 56 d3 41 73 08
0000130 33 40 13 2c 63 1e 64 1b fc 29 d3 45 4b 04 a3 5f
0000140 4b 03 a3 60 4b 02 a3 61 4b 01 a3 62 63 01 74 02
0000150 d3 41 00 ee 80 f8 ff 80 e0 10 70 88 ee 11 77 aa
0000160 a8 a0 80 00                                    
0000164

At 0x154, there's

80 f8

but no instruction begining with 8 can end with an 8 - the only legal instructions ending with an 8 must end with 1,2,3,4,5,6,7, or e. Another one, at 0x158,

e0 10

no instruction matches this format either. the second byte of any instruction starting with e has to be 9E or A1.

This is only a small number of the errors - there's several more of these 'impossible' instructions throughout the code

Am I doing something drastically wrong? How should I deal with these instructions? Just skip over them? Is the page I'm using as my Chip-8 resource somehow incomplete? Any advice on how to deal with this is greatly appreciated. Thanks!

ajb
  • 31,309
  • 3
  • 58
  • 84
Hstuart
  • 55
  • 8

2 Answers2

6

Bear in mind that I'm totally unfamiliar with chip-8 specifically; just low-level computing in general.

That's very probably data; the graphics and sound that make up the game. You don't have to "deal with it"; if the program is written correctly the instruction pointer will never point to that area.

If it DOES end up pointing there, that's an error just like dividing by zero; "deal with it" however you want, presumably by showing the user a message saying "you tried to execute an invalid instruction you numpty."

That's what programmers mean when they say "undefined behavior;" there literally is no definition for what is supposed to happen when the instruction pointer points to something that isn't an instruction. You may do whatever you want, because a correctly made program is never supposed to do it (in real life they do anyway, all the time in fact, but they really shouldn't.)

Schilcote
  • 2,344
  • 1
  • 17
  • 35
  • that makes a lot of sense! Probably the most likely truth. The only reason the pointer has been hitting that area is because I've built a program to translate the hex instructions into human-readable commands before I've actually started the emulator - if it was actually executing correctly, you're probably right, it might just never touch that spot. Thanks for pointing this out to me! – Hstuart May 21 '16 at 22:37
  • 1
    There are Chip-8 disassemblers easily findable by web search. Using one of them to disassemble the hex codes would test this hypothesis, which seems likely to me too. – Simon May 21 '16 at 22:46
1

are you assuming that every byte pair in the binary is an instruction? that would be a bad assumption. When you follow the chips rules for an entry point and you follow the possible code paths do these impossible instruction show up? fixed or variable length instruction sets, various architectures (arm, mips, x86, etc) you will find data in the binary that is not instructions, that is just how that works. disassemble a full sized arm program (fixed length 32 bit instructions) you will find undefined instructions because they are not instructions they are data, addresses need to reach far distances, ascii strings, etc. being fixed length you can go from zero to the end and diassemble (assuming it is fixed length and there is no thumb code in there), but you just have to allow/ignore the illegal bit patterns. same here if that is what you are hitting. It is not always perfect, but to try to eliminate some of them, you should follow the possible execution paths (which you pretty much have to do for variable length instruction sets).

now if you are emulating to get to these and emulating correctly then you are following execution paths. And we probably cant help you. Do you have your endianness right, are you interpreting the byte pairs correctly? perhaps you are lucky for a while then hit an undefined?

EDIT:

this is all the further I got with your binary since it doesnt describe what is in the 0x300s

0000: 6A00
mov r10,0x00
0002: 6B04
mov r11,0x04
0004: 6C01
mov r12,0x01
0006: 6D00
mov r13,0x00
0008: 6E02
mov r14,0x02
000A: 2326
call 326
0326: 0000
UNDEFINED

worse than that I see emulators and other docs say the pc starts at 0x200, for which your binary doesnt have any data.

Okay, so I banged out a chip8 simulator just now and so far your program doesnt hit any undefines. It is waiting for keystrokes and other things I have not hand decoded yet.

will try a disassembler instead.

EDIT2:

so I banged out a disassebmler and it didnt hit those addresses, it ends at

034E : 0x7402  add v4,0x02
0350 : 0xD341  drw v3,v4,1
0352 : 0x00EE  ret

You can bang one out yourself, follow all the code paths, and hopefully you get the same results.

The 0xBnnn instruction is I think the only one that can trip you up as it is data dependent at execution time, so you have to emulate into it (with all the possible combinations that could really happen) to see where it can take you. Basically, if you come across one, you to some extent have to hand examine the possible landing places and go from there.

I did not find that in this code.

My disassembly, see how it compares to yours:

0200 : 0x6A00  ld v10,0x00
0202 : 0x6B04  ld v11,0x04
0204 : 0x6C01  ld v12,0x01
0206 : 0x6D00  ld v13,0x00
0208 : 0x6E02  ld v14,0x02
020A : 0x2326  call 326
020C : 0x2320  call 320
020E : 0x6030  ld v0,0x30
0210 : 0x6101  ld v1,0x01
0212 : 0xF015  ld dt,v0
0214 : 0xF007  ld v0,dt
0216 : 0xF118  ld st,v1
0218 : 0x3000  se v0,0x00
021A : 0x1214  jp 214
021C : 0x2242  call 242
021E : 0x2320  call 320
0220 : 0x7D01  add v13,0x01
0222 : 0x2320  call 320
0224 : 0x6008  ld v0,0x08
0226 : 0xE0A1  sknp v0
0228 : 0x230A  call 30A
022A : 0x4A00  sne v10,0x00
022C : 0x123E  jp 23E
022E : 0xA362  ld i,362
0230 : 0xD891  drw v8,v9,1
0232 : 0x7901  add v9,0x01
0234 : 0xD891  drw v8,v9,1
0236 : 0x4F01  sne v15,0x01
0238 : 0x12F4  jp 2F4
023A : 0x4918  sne v9,0x18
023C : 0x12E4  jp 2E4
023E : 0x22B2  call 2B2
0240 : 0x121E  jp 21E
0242 : 0x4C01  sne v12,0x01
0244 : 0x226C  call 26C
0246 : 0x4C02  sne v12,0x02
0248 : 0x227A  call 27A
024A : 0x4C03  sne v12,0x03
024C : 0x2288  call 288
024E : 0x4C04  sne v12,0x04
0250 : 0x2296  call 296
0252 : 0x4C05  sne v12,0x05
0254 : 0x22A4  call 2A4
0256 : 0xA359  ld i,359
0258 : 0xD672  drw v6,v7,2
025A : 0x4400  sne v4,0x00
025C : 0x00EE  ret
025E : 0xA357  ld i,357
0260 : 0xD452  drw v4,v5,2
0262 : 0x4200  sne v2,0x00
0264 : 0x00EE  ret
0266 : 0xA35B  ld i,35B
0268 : 0xD232  drw v2,v3,2
026A : 0x00EE  ret
026C : 0x6628  ld v6,0x28
026E : 0x6709  ld v7,0x09
0270 : 0x6400  ld v4,0x00
0272 : 0x6500  ld v5,0x00
0274 : 0x6200  ld v2,0x00
0276 : 0x6300  ld v3,0x00
0278 : 0x00EE  ret
027A : 0x6628  ld v6,0x28
027C : 0x670E  ld v7,0x0E
027E : 0x6428  ld v4,0x28
0280 : 0x6514  ld v5,0x14
0282 : 0x6200  ld v2,0x00
0284 : 0x6300  ld v3,0x00
0286 : 0x00EE  ret
0288 : 0x6628  ld v6,0x28
028A : 0x6707  ld v7,0x07
028C : 0x6428  ld v4,0x28
028E : 0x650C  ld v5,0x0C
0290 : 0x6216  ld v2,0x16
0292 : 0x6311  ld v3,0x11
0294 : 0x00EE  ret
0296 : 0x6628  ld v6,0x28
0298 : 0x6707  ld v7,0x07
029A : 0x6428  ld v4,0x28
029C : 0x650E  ld v5,0x0E
029E : 0x6216  ld v2,0x16
02A0 : 0x6314  ld v3,0x14
02A2 : 0x00EE  ret
02A4 : 0x6628  ld v6,0x28
02A6 : 0x6705  ld v7,0x05
02A8 : 0x6428  ld v4,0x28
02AA : 0x6510  ld v5,0x10
02AC : 0x6216  ld v2,0x16
02AE : 0x630B  ld v3,0x0B
02B0 : 0x00EE  ret
02B2 : 0xA359  ld i,359
02B4 : 0xD672  drw v6,v7,2
02B6 : 0x76FE  add v6,0xFE
02B8 : 0xD672  drw v6,v7,2
02BA : 0x4400  sne v4,0x00
02BC : 0x00EE  ret
02BE : 0xA357  ld i,357
02C0 : 0xD452  drw v4,v5,2
02C2 : 0x7402  add v4,0x02
02C4 : 0x4444  sne v4,0x44
02C6 : 0x74C0  add v4,0xC0
02C8 : 0xD452  drw v4,v5,2
02CA : 0x4200  sne v2,0x00
02CC : 0x00EE  ret
02CE : 0xA35B  ld i,35B
02D0 : 0xD232  drw v2,v3,2
02D2 : 0x7202  add v2,0x02
02D4 : 0x4C04  sne v12,0x04
02D6 : 0x7202  add v2,0x02
02D8 : 0x4C05  sne v12,0x05
02DA : 0x7202  add v2,0x02
02DC : 0x4244  sne v2,0x44
02DE : 0x72C0  add v2,0xC0
02E0 : 0xD232  drw v2,v3,2
02E2 : 0x00EE  ret
02E4 : 0x7C01  add v12,0x01
02E6 : 0x6D00  ld v13,0x00
02E8 : 0x6E02  ld v14,0x02
02EA : 0x00E0  cls
02EC : 0x4C06  sne v12,0x06
02EE : 0x6C01  ld v12,0x01
02F0 : 0x6A00  ld v10,0x00
02F2 : 0x120A  jp 20A
02F4 : 0x6006  ld v0,0x06
02F6 : 0xF018  ld st,v0
02F8 : 0x7BFF  add v11,0xFF
02FA : 0x4B00  sne v11,0x00
02FC : 0x1308  jp 308
02FE : 0x6D00  ld v13,0x00
0300 : 0x6E02  ld v14,0x02
0302 : 0x00E0  cls
0304 : 0x6A00  ld v10,0x00
0306 : 0x120A  jp 20A
0308 : 0x1308  jp 308
030A : 0x4A01  sne v10,0x01
030C : 0x00EE  ret
030E : 0x6002  ld v0,0x02
0310 : 0xF018  ld st,v0
0312 : 0x6A01  ld v10,0x01
0314 : 0x88D0  ld v8,v13
0316 : 0x7801  add v8,0x01
0318 : 0x89E0  ld v9,v14
031A : 0x7901  add v9,0x01
031C : 0xD891  drw v8,v9,1
031E : 0x00EE  ret
0320 : 0xA354  ld i,354
0322 : 0xDDE2  drw v13,v14,2
0324 : 0x00EE  ret
0326 : 0x6419  ld v4,0x19
0328 : 0x6300  ld v3,0x00
032A : 0xA356  ld i,356
032C : 0xD341  drw v3,v4,1
032E : 0x7308  add v3,0x08
0330 : 0x3340  se v3,0x40
0332 : 0x132C  jp 32C
0334 : 0x631E  ld v3,0x1E
0336 : 0x641B  ld v4,0x1B
0338 : 0xFC29  ld f,v12
033A : 0xD345  drw v3,v4,5
033C : 0x4B04  sne v11,0x04
033E : 0xA35F  ld i,35F
0340 : 0x4B03  sne v11,0x03
0342 : 0xA360  ld i,360
0344 : 0x4B02  sne v11,0x02
0346 : 0xA361  ld i,361
0348 : 0x4B01  sne v11,0x01
034A : 0xA362  ld i,362
034C : 0x6301  ld v3,0x01
034E : 0x7402  add v4,0x02
0350 : 0xD341  drw v3,v4,1
0352 : 0x00EE  ret

There are a number of instructions that load i with an address in that space at and after 0x354, so I would assume what you find there is data used by the program, not instructions. and the largest one is 0x362. of the data described by your hexdump the largest address is 0x363 but it is a zero and/or intentional padding or would have to look at the code with respect to the use of i.

022E : 0xA362  ld i,362
0256 : 0xA359  ld i,359
025E : 0xA357  ld i,357
0266 : 0xA35B  ld i,35B
02B2 : 0xA359  ld i,359
02BE : 0xA357  ld i,357
02CE : 0xA35B  ld i,35B
0320 : 0xA354  ld i,354
032A : 0xA356  ld i,356
033E : 0xA35F  ld i,35F
0342 : 0xA360  ld i,360
0346 : 0xA361  ld i,361
034A : 0xA362  ld i,362
old_timer
  • 69,149
  • 8
  • 89
  • 168