Your question suggests you are just starting with both statistics and R. You'll benefit from first learning the basics and there are many resources available online. Take a look for example at Modern Statistics with R.
With that advice out of the way, let's look at your questions.
You provided just three rows of data which is not enough to estimate the parameters of the proposed regression. So I add three rows of fake data.
library("tidyverse")
reg_spé_amb <-
tibble::tribble(
~spécialité, ~`nombre de praticiens`, ~`activité en valeur`, ~année,
"anesthésie", 10L, 500L, 2019L,
"chirurgie", 11L, 600L, 2019L,
"anesthésie", 9L, 450L, 2020L,
"chirurgie", 9L, 550L, 2019L,
"chirurgie", 10L, 450L, 2019L,
"anesthésie", 10L, 650L, 2020L
)
It's convenient to fit a model to the data using the formula interface. R keeps track of lots of information for us, including the variable names, so there is no need for awkward specifications such as reg_spé_amb[,1]
.
lm_amb <- lm(
formula = `activité en valeur` ~ `nombre de praticiens` + spécialité + année,
data = reg_spé_amb
)
After we fit the model, we validate it. This step is crucial because what's the point in extracting coefficients from a model that's a poor fit to the data? See 8.1.4 Model diagnostics in "Modern Statistics with R".
[Lots of work to validate the model...]
Once we are satisfied with the model, we look at the coefficients and various other statistics.
Here is how we can get the coefficient for the linear effect of year (année) and number of doctors (nombre de praticiens) on the response (activité en valeur).
summary(lm_amb)
#>
#> Call:
#> lm(formula = `activité en valeur` ~ `nombre de praticiens` +
#> spécialité + année, data = reg_spé_amb)
#>
#> Residuals:
#> 1 2 3 4 5 6
#> -3.197e-14 6.667e+00 -7.000e+01 7.667e+01 -8.333e+01 7.000e+01
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -161620.00 272131.90 -0.594 0.613
#> `nombre de praticiens` 60.00 67.33 0.891 0.467
#> spécialitéchirurgie 33.33 122.93 0.271 0.812
#> année 80.00 134.66 0.594 0.613
#>
#> Residual standard error: 106.5 on 2 degrees of freedom
#> Multiple R-squared: 0.32, Adjusted R-squared: -0.7
#> F-statistic: 0.3137 on 3 and 2 DF, p-value: 0.819
Or more succinctly and with better table formatting:
broom::tidy(lm_amb)
#> # A tibble: 4 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) -161620. 272132. -0.594 0.613
#> 2 `nombre de praticiens` 60 67.3 0.891 0.467
#> 3 spécialitéchirurgie 33.3 123. 0.271 0.812
#> 4 année 80.0 135. 0.594 0.613
It's just as easy to get one coefficient (marginal effect) for each level of the categorical variable specialty (spécialité).
emmeans::emmeans(lm_amb, ~ spécialité)
#> spécialité emmean SE df lower.CL upper.CL
#> anesthésie 530 65.4 2 248 812
#> chirurgie 563 89.8 2 177 950
#>
#> Results are averaged over the levels of: année
#> Confidence level used: 0.95
As a bonus, here is how to compare the marginal effects of the two specialties. Note that comparison anesthésie - chirurgie is what we get from the summary table above as the coefficient spécialitéchirurgie. The sign is flipped because spécialitéchirurgie is actually the comparison chirurgie - anesthésie.
emmeans::emmeans(lm_amb, pairwise ~ spécialité)
#> $contrasts
#> contrast estimate SE df t.ratio p.value
#> anesthésie - chirurgie -33.3 123 2 -0.271 0.8117
#>
#> Results are averaged over the levels of: année
Understanding all of the above will help you to understand how to fit and interpret a regression model.
Created on 2022-06-25 by the reprex package (v2.0.1)