0

Similar to: How to calculate and plot odds-ratios and their standard errors from a logistic regression in R?

But I would like to plot the Phenotypes separately in the plot.

Data (subset of 20,000 similar lines):

ID  PHENO  SCORE
1     1    -0.001
2     1     0.132
3     1     0.023
4     0    -0.20032
5     1    -0.002
6     0     0.012
7     1    -0.23
8     0     0.321
9     0    -0.21
10    0    -0.497

I have then run a glm logistic model on this data

I would like to put the scores into deciles or some meaningful division and then work out the Odds ratio of having the phenotype (1 is having the disease, 0 is controls) per division of score , ideally between cases and control, using R.

To decile I do:
library(dplyr) 

#place each value into a decile
data$decile <- ntile(data, 10)

I then follow the question above but wanted the plot to show the cases and controls separately.

I would like to end up with an image like below (with case(1) vs control(0) from the PHENO column:

enter image description here

Any help would be appreciated.

tacrolimus
  • 500
  • 2
  • 12
  • Could you please share some more data? – Quinten Oct 28 '22 at 17:04
  • It's also not exactly clear what the points and bars represent in the figure. Are they aggregates of the predictions for the two different groups in the data. For example, does the green dot over 2 use only those Caucasian respondents who had a score in the second decile? – DaveArmstrong Oct 28 '22 at 19:24

1 Answers1

1

First of all, I generated some random data to make it more reproducible. First, you could make your target and deciles a factor. To extract the odds ratios and confidence intervals, you could use coef and confint with exp. After you can take the mean of each ID and PHENO of your results. To create the graph you can use geom_pointrange like this:

# Generate random data
set.seed(7)
data <- data.frame(ID = rep(c(1:10), 2000),
                   PHENO = sample(c(0,1), replace=TRUE, size=20000),
                   SCORE = rnorm(20000, 0, 1))

library(dplyr)
library(ggplot2)
#place each value into a decile
data <- data %>% mutate(decile = ntile(SCORE, 10))
# convert PHENO and decile to factor
data$PHENO <- as.factor(data$PHENO)
data$decile <- as.factor(data$decile)

# model
fit <- glm(PHENO ~ decile, data=data, family='binomial') 

# Extract odds ratio with intervals
results <- as.data.frame(exp(cbind(coef(fit), confint(fit))))
#> Waiting for profiling to be done...
# Change columnames results dataframe
colnames(results) <- c('odds_ratio', '2.5', '97.5')
# Add id column
results$ID <- c(1:10)

# Join data and results dataframe
data <- left_join(data, results, by = 'ID')

# Take mean
data_sum <- data %>%
  group_by(decile, PHENO) %>%
  summarise(odds_ratio = mean(odds_ratio),
            `2.5` = mean(`2.5`),
            `97.5` = mean(`97.5`))
#> `summarise()` has grouped output by 'decile'. You can override using the
#> `.groups` argument.

# plot
ggplot(data_sum, aes(x = decile, y = odds_ratio, ymin = `2.5`, ymax = `97.5`, color = PHENO, shape = PHENO)) +
  geom_pointrange(position = position_dodge(width = 0.4)) +
  scale_color_manual(values = c('blue', 'green')) + 
  scale_shape_manual(values = c(18, 16)) +
  guides(shape = 'none') +
  theme_classic() +
  labs(x = 'Decile', y = 'Odds ratio', color = '') 

Created on 2022-10-29 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53
  • Thanks for taking the time to do this. The only issue is that the odds ratios across both cohorts are exactly the same? – tacrolimus Oct 31 '22 at 12:00