I have a data frame with multiple observations for each unique row id (subjectID), which looks like this (a short example):
df <- data.frame(subjectID = rep(c(1:4), each=3),
y = sample(0:2, 4, replace = TRUE),
x = rep(c(0:2), times = 4))
df <- as.tibble(df)
> df
# A tibble: 12 x 3
subjectID y x
<int> <int> <int>
1 1 0 0
2 1 1 1
3 1 1 2
4 2 2 0
5 2 0 1
6 2 1 2
7 3 1 0
8 3 2 1
9 3 0 2
10 4 1 0
11 4 1 1
12 4 2 2
For each unique subjectID "unique(df$subjectID)" (i.e. 4 in the example above) I would like to:
#1. regress y on x (3 x- and y-observations for each subjectID ) to obtain coefficients
reg <- lm(y~x)
#2. store intercept
intercept <- reg$coefficients[1]
#3. store slope
slope <- reg$coefficients[2]
#4. calculate Spearman's rho with library(ggpubr)
rho <- cor(x, y, method = c("spearman"))
#5. calculate the Spearman p-value with library(ggpubr)
pvalue <- cor.test(x, y, method=c("spearman"))
The results I would like to store in a new data frame like this:
# creating an data frame to store regression output for each unique subjectID
output <- tibble(
subjectID = rep(c(1:4), each=1),
intercept = rep(c(0), each=4),
slope = rep(c(0), each=4),
rho = rep(c(0), each=4),
pvalue = rep(c(0), each=4))
> output
# A tibble: 4 x 5
subjectID intercept slope rho pvalue
<int> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 0 0 0 0
I have 21 x- and y-obs for each unique subjectID, and more than 100 unique subjectIDs in the 'real' data frame, so I would like to avoid to do it manually. So I was wondering if a loop-function can be created which achieves this?
In dplyr or base R. With dplyr I tried to use pipes and group_by(subjectID), but don't manage to set up the function.