You might want to ask a similar question on StatsExchange. However, here is an attempt for a high level "build some intuition" answer (disclaimer: I am a Computer Scientist and not a Statistician. Head over to StatsExchange for a more formal discussion).
Bayesian Inference:
In the most basic sense we follow Bayes rule: p(Θ|y)=p(y|Θ)p(Θ)/p(y). Here p(Θ|y) is called the 'posterior' and this is what you are trying to compute. p(y|Θ) is called the 'data likelihood' and is typically given by your model or your generative description of the data. p(Θ) is called the 'prior' and it captures your belief about the plausible values of the parameters before observing the data. p(y) is called the 'marginal likelihood' and using the law of total probability can be expressed as ∫ p(y|Θ)p(Θ) dΘ. That looks really neat but in reality the p(y) is often intractable to compute analytically and in high dimensions (i.e. when Θ has many dimensions) numerical integration is imprecise and computationally intractable. There are certain cases when the conjugate structure of the problem allows you to compute this analytically, but in many useful models this is simply not possible. Therefore, we turn to approximating the posterior.
There are two ways (that I know of) to approximate the posterior: Monte Carlo and Variational Inference. Since you asked about MCMC, I'll stick to that.
Monte Carlo (and Markov Chain Monte Carlo):
Many problems in Statistics deal with taking expectations of functions under probability distributions. From the Law of Large Numbers, an expectation can be efficiently approximated by a Monte Carlo estimator. Therefore, if we can draw samples from a distribution (even if we don't know the distribution itself) then we can compute a Monte Carlo estimate of the expectation in question. The key is that we don't need to have an expression for the distribution: If we just have samples then we can compute the expectations that we are interested in. But there is a catch... How to draw the samples??
There has been a lot of work which developed ways of drawing samples from unknown distributions. These include 'rejection', 'importance' and 'slice' sampling. These were all great innovations and were useful in many applications but they all suffered by scaling poorly to high dimensions. For example, rejection sampling draws samples from a known 'proposal' distribution and then accepts or rejects that sample based on a probability that needs to evaluate the likelihood function and the proposal function. This is wonderful in 1 dimension but as the dimensionality grows, the probability mass that a given sample gets rejected increases dramatically.
Markov Chain Monte Carlo was an innovation that has some super nice theoretical guarantees attached to it. The key idea was to not randomly draw samples from a proposal distribution but rather to use a known sample (with the hope that the sample is in an area of high probability mass) and then make a small random step under a draw from a proposal distribution. Ideally, if the first draw was in an area of high probability mass then the second draw is also likely to be accepted. Therefore, you end up accepting many more samples and you don't waste time drawing samples that are to be rejected. The amazing thing is that if you run the Markov Chain long enough (i.e. to infinity) and under specific conditions (the chain must be finite, aperiodic, irreducible and ergodic) then your samples will be drawn from the true posterior of your model. That's amazing! The MCMC technique is to draw dependent samples so it scales to a higher dimensionality than previous methods, but under the right conditions, even though the samples are dependent, they are as if they are drawn IID from the desired distribution (which is the posterior in Bayesian Inference).
Tying it together (and hopefully answering your question):
MCMC can be seen as a tool that enables Bayesian Inference (just as analytical calculation from conjugate structure, Variational Inference and Monte Carlo are alternatives). Apart from an analytical solution, all of the other tools are approximating the true posterior. Our goal is then to make the approximation as good as possible and to do this as cheaply as possible (in both computation cost and the cost of computing a bunch of messy algebra). Pervious sampling methods did not scale to high dimensions (which are typical of any real world problem) and therefore Bayesian Inference became computationally very expensive and impractical in many instances. However, MCMC opened the door to a new way to efficiently draw samples from a high dimensional posterior, to do this with good theoretical guarantees and to do this (comparatively) easily and computationally cheaply.
It is worth mentioning that Metropolis itself has problems: it struggles with highly correlated latent parameter space, it requires a user-specified proposal distribution and the correlation between samples can be high leading to biased results. Therefore more modern and sometimes more useful MCMC tools have been proposed to try combat this. See 'Hamiltonian Monte Carlo' and the 'No U-Turn Sampler' for the state of the art. Nonetheless, Metropolis was a huge innovation that suddenly made real world problems computationally tractable.
A last note: See this discussion by MacKay for a really good overview of these topics.