Hello dear StackOverFlow.
I have a few conceptual questions regarding the intersection between probability theory and statistics.
I already know a few basic conceptual connections between probability and statistics. For example, I know that in statistics we use a sample to infer the parameters of a population probability distribution, which we can subsequently use to assess the probabilities of future events. In other words, statistics take the "backward" view and use collected data to infer the population parameters that are then used for the "forward" view of the probabilities of future outcomes.
And I know that the process of inferring the population parameters from a sample can entail e.g. maximum likelihood estimation. For example, if we are dealing with linear regression, we would use a probability density function of a normal distribution and insert estimates of our parameters (mean and variance) with our linear regression equation (b0 + b1*x) into the pdf of a normal distribution to ascertain a likelihood estimate. The final set of parameter estimates are those that have maximized the likelihood estimate.
The big picture view that I have is that we use concepts from probability theory such as probability density functions, and we apply that to statistics in order to have a data-based reason for the assignment of the specific parameter values to our probability distribution that we use, and given these parameters estimates we make assessments about the probable likelihood of future outcomes.
However, and here come the questions, I am having trouble connecting further aspects of probability theory with concepts used in statistics. For example, I know that joint probability distributions are used to ascertain marginal and condition probabilities, with conditional probabilities being used in e.g. linear regression. But I am lacking the substantial knowledge of the connection between the two.
Are joint probability distributions what underlie covariance matrices, for example? How are joint probability distributions and marginal distributions relevant in statistics (I know how conditional probability distributions are relevant)?
Or am I looking in an entirely wrongheaded direction? Am I asking the wrong questions here? In what other way would probability theory and statistics be connected? I am looking for depth of knowledge, looking for substantial knowledge. If anyone has any literature that is not excessively math heavy but more focused on the conceptual overview of the connections between probability theory and statistical analysis, I would be very happy.