MIPS is essentially a processor, like ARM, x86, PowerPC, 68K, pdp-11 and many many others.
We know that intel pushed or helped push the idea of CISC and eventually had to resort to basically microcode using something else like a RISC or vliw.
MIPS, on the other hand pushed or helped push the idea of RISC. The founders wrote basically a text book. The push the ideas of pipelining, caching, etc. things that everything made since or re-architected since tends to use, even microcontrollers. It is as good as any architecture to learn for educational purposes, the patents are such that you can make a usable processor from scratch, and avoid the patents (can with ARM as well but there isnt a line of textbooks that go with that). Look at opencores.org for example. Electrical and computer engineering students can learn the logic design side of computers using this architecture, or others but this tends to be a useful one. computer engineering and computer science students can learn assembly language to understand, at least a little bit better, what is going on behind their languages. Assembly language is a language itself although there are hundreds or thousands of them that use that same name: Assembly language, but are completely incompatible. Once you learn one though, usually that makes the others just a matter of syntax, likewise it helps make higher level programming languages just a matter of syntax.
MIPS could be what we use instead of ARM. For every non-arm based thing you have you have perhaps at least one ARM based thing, even the x86 you are possibly reading this on, has one or some ARM cores in it. You would have to ask their marketing and sales why they didnt win. Likewise it was the 68K which was a chip not a core, that dominated before ARM took that over (for every x86 you had then you likely had one or two 68K based things).
Absolutely a compiler can certainly do the assembly step itself and output machine code instead of assembly language code. but think of it from a perspective of testing, debugging, maintenance. Any new processor starts off with an assembler first, then maybe a linker at the same time, then later a compiler, but you always have an assembler and that assembler is used initially to develop the core. There is always an assembler available. it is much easier for a human writing a compiler to "see" assembly language output than machine code, they would need a disassembler just to debug their program. Now the trend in JIT compilers doesnt require necessarily but it greatly helps to go straight from intermediate code to machine code without calling another program or two. So we see a little of that. Likewise it makes sense to compile to objects and link the objects together rather than do all of it in one shot. So while possible and folks have done it, the overwhelming trend is to have a compiler that compiles to assembly language, assembler that assembles the assembly language into machine code objects then a linker that links those together into a final binary. This is how the major toolchains work.
JAVA is not only a high level language but it has its own platform independent machine language. the compiler compiles the high level into this machine language. Then you have to have a jvm written in some other language, that the java runs on. Or some other way to run that java. Some compilers have been made to go to assembly language, or to convert the java into some other machine code. But the normal way of using java is to have this compile step then a runtime step.
Python, pascal and others can and have had a platform independent machine code, that then requires an interpreter to execute.
MIPS and ARM make IP, logic that a chip vendor someone who makes chips (or has them made since most are fabless) buys this ip and integrates it into their chip product. the PowerPC, intel x86, motorola 68K and many others are traditionally the same folks that own or make the processor core ip also make the chips and they dont sell the ip for those cores generally to other chip vendors. Everyone defends their IP in court, making it such that it is very difficult to make clones or even make a new instruction set that doesnt use patented ideas from someone else.
Why learn this in school? Perhaps to find out just how things work and how inefficient high level languages can be or how important optimizers and other technology are, many other reasons.