EDIT: hoping to make it more readable.
The hardware does not look at the memory as a long list of unorganized bytes. All processors, fixed or variable word length, have a specific boot method. Usually a known address in the processors memory/address space with either an address to the first instruction of the boot code or the first instruction itself. From there and for every instruction the address of the current instruction is where to start decoding.
For an x86 for example it has to look at the first byte. Depending on the decoding of that byte it may need to read more opcode bytes. If the instruction required an address, offset or otherwise some form of immediate value those bytes are there as well. Very quickly the processor knows exactly how many bytes are in this instruction. If the decoding shows the instruction contains 5 bytes and it started at address 0x10 then the next instruction is at 0x10+5 or 0x15. This continues forever. Unconditional branches, which depending on the processor can come in various flavors, you dont assume the bytes that follow the instruction are another instruction. Branches, conditional or unconditional give you a clue where another instruction or series of instructions start in memory.
Note the X86 today definitely does not fetch one byte at a time when it decodes an instruction, sensible sized reads occur, probably 64 bits at a time, and the processor will pull the bytes out of that as needed. When reading a single byte from a modern processor the memory bus still does a full sized read and either presents all of those bits on the bus where the memory controller only pulls the bits it was after or it may go so far as to keep that data. You will see some processors where you may have two 32 bit read instructions at back to back addresses but only one 64 bit read happens on the memory interface.
I highly recommend you write a disassembler and/or an emulator. For fixed length instructions it is pretty easy, you just start at the beginning and decode as you go through memory. A fixed word length disassembler may help learning about decoding instructions, which is part of this process, but it wont help your understanding of following variable word length instructions and how to separate them without getting out of alignment.
The MSP430 is a good choice as a first disassembler. There are gnu tools asm and C, etc (and llvm for that matter). Start with assembler then C or take some pre-made binaries. They key is you have to walk the code like the processor, start with the reset vector and walk your way through. When you decode one instruction you know its length and know where the next instruction is until you hit an unconditional branch. Unless the programmer has intentionally left a trap to fool the disassembler, assume that all branches conditional or unconditional point at valid instructions. An afternoon or evening is all that is required to bang out the whole thing or at least get the concept. You dont necessarily need to completely decode the instruction, dont have to make this a full blown disassembler, only need to decode enough to determine the length of the instruction and determine if it is a branch and if so where. Being a 16 bit instruction you can, if you choose, one time build a table of all possible instruction bit combinations and what their lengths are, that may save some time. You still have to decode your way through branches.
Some folks might use recursion, instead I use a memory map showing me what bytes are the start of an instruction, what bytes/words are part of an instruction but not the first byte/word and what bytes I have not decoded yet. I start by taking the interrupt and
reset vectors and using those to mark the starting point for instructions. and then go into
a loop that decodes the instructions looking for more starting points. If a pass happens
with no other starting points then I have finished that phase. If at any point I find an instruction starting point that lands in the middle of an instruction there is a problem that will require human intervention to solve. Disassembling old video game roms for example you are likely to see this, hand written assembler. Compiler generated instructions
tend to be very clean and predicable. If I get through this with a clean memory map of instructions and what is left over, (assume data) I can make one pass knowing where instructions are and decode and print those out. What a disassembler for variable word length instructions sets can never do is find every instruction. If the instruction set has for example a jump table or otherwise some sort of runtime computed address for executing, you wont find all of those without actually executing the code.
There are a number of existing emulators and disassemblers out there, if you want to try to follow along instead of writing your own, I have a few myself http://github.com/dwelch67.
There are pros and cons for and against variable and fixed word length. Fixed has advantages sure, easy to read easy to decode, everything is nice and proper, but think about ram, the cache in particular, you can cram a lot more x86 instructions in the same cache as an ARM. On the other hand an ARM can decode much easier, far less logic, power, etc more bang for your buck. Historically memory was expensive logic was expensive, and a byte as you go was how it worked. a single byte opcode limited you to 256 instructions, so that expanded into some opcodes needing more bytes not to mention the immediates and addresses that made it variable word length anyway. Keep reverse compatibility for decades and you end up where you are now.
To add to all of this confusion ARM for example now has a variable word length instruction set. Thumb had a single variable word instruction, the branch, but you can easily decode this as fixed length. But they created thumb2 which really does resemble a variable word length instruction set. Also many/most of the processors that support the 32 bit ARM istructions also support 16 bit thumb instructions, so even with an ARM processor you cannot simply align the data by words and decode as you go, you have to use variable word length. What is worse the ARM to/from thumb transitions are decoded by executing, you normally cannot simply disassemble and figure out arm from thumb. A mixed mode compiler generated branch often involves loading a register with the address to branch to then using a bx instruction to branch to it, so the disassembler would need to look at the bx, look backwards in execution for the register used in the branch and hope you find a load there and hope that is the .text segment it is loading from.