1

I am looking to create a simple RISC-V disassembler in C++.
The goal is to be able to take a .bin file, composed of separate bytes in hexadecimal, and parse those bytes into readable, formatted RISC-V instructions.
I suppose it could be done using just switch statements, but what is a more robust way of approaching this problem?

I'm a CS student, so I am approaching this as a learning exercise and want to familiarize myself with the tools of C++ with this project.

Yunnosch
  • 26,130
  • 9
  • 42
  • 54
EthanR
  • 139
  • 9

1 Answers1

2

Assuming that the instructions are always of the same size, i.e. take the same number of bytes, including addressmode selections and parameters, then lookup tables are probably a helpful tool.
In case that instructions are of varying size (which I would consider untypical for a RISC concept) the lookup tables probably would have to be replaced by state machines.

Yunnosch
  • 26,130
  • 9
  • 42
  • 54
  • I believe each instruction is 32 bits, so yes they are the same size. How would I go about implementing a look up table in C++? – EthanR Jul 30 '20 at 12:21
  • I'd look at [Michael Clark's](https://github.com/michaeljclark/riscv-disassembler) RISC-V disassembler code, to see how he did it. – Eljay Jul 30 '20 at 12:25
  • @Eljay A reasonable idea. But in this case probably not matching OP's learning goal. – Yunnosch Jul 30 '20 at 12:27
  • @Yunnosch • possibly; there are several categories for how people learn. I'm a monkey-see-monkey-do learner, so seeing how someone else has solved the problem can be very educational for that kind learner. (Example, Linus Torvalds was educated by, and inspired by reading the source code to Andrew Tanenbaum's Minix. Note: from the interviews I've seen, Andrew is happy for Linus and Linux, and has his blessing. Minix was intended to be educational.) – Eljay Jul 30 '20 at 12:46
  • @Eljay Fair enough. A user however who thinks of switch statements for making a disassembler gives an impression to me of not needing the techniques applied in an established implementation. Without being familiar with what you linked (I admit) I assume that they are non-trivial. So I answer in what I hope is an appropriate level of detail (matching the question) and comlexity (just above the level of understanding demonstrated by OP). – Yunnosch Jul 30 '20 at 12:50
  • EthanR, please ask a separate new question on how to implement a few lookup tables for some examples of input you expect. Or think about how you would output the name of a day of the week if you got a number, starting with 0 for "Monday". In short, an array of string literals. You might need [ask] for that kind of question. – Yunnosch Jul 30 '20 at 12:55
  • RISC V supports instructions whose sizes are multiples of 16-bits. There is currently only one non 32-bit extension, but it is popular; it is the "C" extension for compressed instructions that are all 16-bits. – Erik Eidt Jul 30 '20 at 13:52
  • Yes sorry I was confusing RISC-V with MIPS for a bit there. – EthanR Jul 30 '20 at 15:00