After a bit of research it has become apparent that trying to recreate modern ALUs and instructions would be very complicated and time consuming, and I should definitely make my own simplistic architecture and if I really want to compile C code for it I could probably just interpret x86 or AVR assembly from gcc.
I would also love some feedback on my design, I came up with a really weird ISA last night that is focused mainly on being easy to engineer the hardware.
There are two registers in the ALU, all other registers perform functions based off those two numbers all at the same time. For instance, there is a register that holds the added result of A and B, one that holds the result of A shifted right B times, a "jump if A > B" branch, and so on.
And so to add a number, it would take 3 clock cycles, you would move two values from ram into A and B, then copy the data back to ram afterwards. It would look like this:
setA addressInRam1 (6-bit opcode, 18-bit address/value)
setB addressInRam2
copyAddedResult addressInRam1
And program code is executed directly from EEPROM memory. I don't know if I should think of it as having two general purpose registers or it having 2^18 registers. Either way, it makes it much easier and simpler to build when you're executing instructions one at a time like that. Again any advice is welcome, I am somewhat of a noob in this field, thank you!
Oh and then an additional C register to hold a value to be stored in RAM the next clock cycle specified in the set register. This is what the Fibonacci sequence would look like:
1: setC 1; // setting C reg to 1
2: set 0; // setting address 0 in ram to the C register
3: setA 0; // copying value in address 0 of ram into A reg
// repeat for B reg
4: set 1; // setting this to the same as the other
5: setB 1;
6: jumpIf> 9; // jump to line 9 if A > B
7: getSum 0; // put sum of A and B into address 0 of ram
8: setA 0; // set the A register to address 0 of ram
9: getSum 1; // "else" put the sum into the second variable
10: setB 1;
11: jump 6; // loop back to line 6 forever
I made a C++ equivalent and put it through compiler explorer and despite the many drawbacks of this architecture it uses the same amount of clock cycles as x64 in the loop and two more in total. But I think this function in particular works pretty well with it as I don't have to reassign A and B often.