1

I am currently working on a project where I need to estimate changes in physical fitness over time using segmented regression with quantile regression in R. The data I'm working with consists of centile values from various tests, along with additional variables such as year, age group, weight status, and socioeconomic status (currently not in the model). I do have an ID column but there is only one observation per participant as this is an analysis of repeated cross-sectional data.

Here is a minimal, reproducible example of my data:

        id     ri_t600 leto_meritev weight_status2
1  6127454 0.179709154         2016         normal
2  3980838 0.978963377         2001     overweight
3  2774166 0.390405133         2007     overweight
4  1073113 0.025642580         2004         normal
5  6773578 0.238324573         1997         normal
6  5519253 0.994369221         1993     overweight
7  6838723 0.538462935         1997         normal
8  6016016 0.128138805         1994         normal
9  3215266 0.036528587         1991         normal
10 1963399 0.340610927         2013         normal
11  182440 0.963138367         2017     overweight
12 4669807 0.797643121         1994     overweight
13 4145886 0.180022262         2001         normal
14 7097019 0.803258459         2019         normal
15 4444052 0.119754080         1995         normal
16 4997667 0.001910298         2016         normal
17  935319 0.987400068         1993         normal
18 4996953 0.448889710         1989     overweight
19 1277313 0.239764074         2016         normal
20 4071109 0.160539785         2007         normal
21 5277830 0.991878782         1996         normal
22 1128991 0.321603180         2018         normal
23 2985130 0.429698600         2009         normal
24 5289288 0.108017596         1997         normal
25 3601256 0.781181377         2009         normal
26 3588296 0.861045956         2008         normal
27  749642 0.397665975         1993         normal
28 3364739 0.100899654         1990         normal
29 6415968 0.088997989         2018         normal
30 6418673 0.096091582         1990         normal
31 1051895 0.271999709         1990     overweight
32  884990 0.195473708         2015     overweight
33 4820890 0.645156168         2005     overweight
34 4005036 0.797750286         2004         normal
35 1275617 0.100194241         2018         normal
36 5458071 0.511416750         2011         normal
37  229093 0.138935340         2018         normal
38 6250384 0.618567227         2018         normal
39 5501375 0.053032507         1995         normal
40 2355478 0.317074434         2007         normal
41 5346473 0.194654348         2016         normal
42 6913525 0.237211849         2015         normal
43 2940600 0.606357737         1993         normal
44 2258047 0.521430525         2015         normal
45 2384073 0.550012191         2007         normal
46  881742 0.268146643         1990         normal
47 5758330 0.056041775         1996         normal
48 4198975 0.705752766         2013         normal
49  650372 0.322134155         1999         normal
50 1923040 0.877469326         1990         normal

To perform the analysis, I have written the following code:

seg_fit <- segmented.default(rq(ri_t600 ~ leto_meritev + weight_status2, tau = 0.50, data = boys_one), seg.Z = ~ leto_meritev, npsi = 1, control = seg.control(display = FALSE))

The boys_one data table contains 746,912 observations of boys belonging to the age group == 1 (one of the age groups). It includes three variables: ri_t600 (centile values for a 600-meter run), leto_meritev (year), and weight_status2 (two groups: normal and overweight). This was done to reduce the dataset.

Also, in segmented.default vcov is used to calculate the covariance matrix but does not work for the quantile regression object. Therefore, to get the covariance matrix for quantile regression I used this function:

vcov.rq <- function(object,...){
  sm <- summary(object, se = "ker", covariance = TRUE)$cov
  rownames(sm) <- colnames(sm) <- names(coef(object))
  return(sm)
}

EDIT: As I realized I further subsetted my data frame to only participants with normal weight/overweight and then segmented worked! However, now I have a different problem. The results I get are not of class segmented as it says. As I commented previously, one of my goals is to identify the magnitude of the increase or decrease observed in the interval by estimating the annual percent change (APC), defined as (exp(bj) − 1) × 100, where bj is the slope in segment j. This is based on Clegg et al. (2009) who proposed the average annual percent change (AAPC) computed as the sum of the slopes (βj) weighted by corresponding covariate sub-interval width (wj), namely:

formula

Since the weights are the breakpoint differences, the standard error of the AAPC should account for uncertainty in the breakpoint estimate. Again, it could be that I am misspecifying the function or the arguments relating to the quantile or segmented regression but I cannot compute the Schwarz information criterion (BIC) to select the optimal number of joinpoints and the AAPC from the segmented analysis I have done. The error I am getting with aapc() is that ogg[[2]] subscript is out of bounds. I am guessing that the problem with the aapc() function from the segmented package is in this part of the code: COV <- if(is.null(.vcov)) vcov(ogg,...) else .vcov

I would appreciate any suggestions or improvements to optimize the code and make it run efficiently. Additionally, if there are any alternative approaches or best practices for performing segmented regression with quantile regression in R, I would be grateful to know about them.

Thank you in advance for your help!

  • https://stats.stackexchange.com/questions/230219/variance-covariance-matrix-of-coefficients-in-quantile-regression … So there should not be a vcov result. Perhaps you could do a rank transformation on your outcome and then linear regression on that model. – IRTFM May 18 '23 at 18:40
  • @IRTFM I am not interested that much in the vcov result, just the result of the segmented regression, i.e., the joinpoints. Through applying segmented regression analysis, I want to identify the moment when the change has occurred in the trend, as well as the magnitude of the increase or decrease observed in the interval by estimating the annual percent change (APC), defined as (exp(bj) − 1) × 100, where bj is the slope in segment j. – Antonio Martinko May 19 '23 at 06:35
  • I’m rather puzzled by the choice of quantile regression for this purpose. You wouldn’t really expect the ranks to change that much, even if the values “topped out” or had a change in slope. Also puzzling is the lack of a subject ID, since it sounds like you are expecting multiple observations per subject. I think more complete problem description and code is needed. – IRTFM May 19 '23 at 16:50
  • @IRTFM thank you for your feedback, as I have now edited the question to provide more nuance but also that I have managed to do the analysis but also a new set of questions arose. Namely, this now relates to calculating the BIC and AAPC from the models. My project closely relates to the study that I will link here if you want an even more nuanced look into the analysis: https://www.nature.com/articles/s41598-022-14813-7 – Antonio Martinko May 21 '23 at 07:08
  • Using "centiles" is basically the same as doing a rank transformation. That paper than uses regression techniques with a p-spline fit. It makes no sense at all to do rank regression if you (or someone else?) have already done a rank transformation on your outcome. I say again: you need statistical consultation. – IRTFM May 21 '23 at 12:00

0 Answers0