0

I am working with R. I need to identify the predictors of higher Active trial start percentage over time (StartDateMonthsYrs). I will do linear regression with Percent.Active as the dependent variable. My original dataframe is attached and my obtained Active trial start percentage over time (named Percent.Activeis presented here.

So, I need to assess whether federal sponsored trials, industry sponsored trials or Other sponsored trials were associated with higher active trial start percentage over time. I have many other variables that I wneed to assess but this is the sample of my data.

I am thinking to do many crosstabs for each variable (eg Fedral & Active then Industry & Active..etc.) in each month (may be with help of lapply then accumulate the obtained percentages data in the second sheet then run the analysis based on that.

My code for linear regression is as follow:

q.lm0 <- lm(Percent.Active ~ Time.point+ xyz, data.percentage);summary(q.lm0)
Mohamed Rahouma
  • 1,084
  • 9
  • 20

1 Answers1

0

I'm a little bit confused. You write 'associated'. If you really want to look for association then yeah, a crosstab might be possible, and sufficient, as association is not the same as causation (which is further derived from correlation, if there is a theory behind). If you look for correlation, and insights over time, doing a regression with the lm package is not useful.

If you want to look for a regreesion type analysis there are packages in R like the plm package, which can deal with panel data, as you clearly have panel data (time points, and interested trials labels, and repetitive time points for these labels). Look at this post for infos about the package:https://stackoverflow.com/questions/2804001/panel-data-with-binary-dependent-variable-in-r

I'm writing you this because your Percent.Activevariable is only a binary outcome of 0/1 I'm not sure if this is on purpose. However, even if your outcome is not binary, the plm package might help, but you will find other mentioned packages in that post.

Patrick Bormann
  • 729
  • 6
  • 16
  • Thanks for your precious input. `Percent.Active` is a continuous variable as you can see in the second excel sheet column F. If you can write the appropriate code, that will be great. I am still trying to understand `plm` package though. – Mohamed Rahouma Apr 02 '21 at 19:36
  • 1
    I dont see any continous variable I thought you mean "OverallStatus.Active1.x.Comple.Susp.Term.Unk.Withd0" is your percentage variable.. is there by any chance a second tab?? If so I cant see it. Do you still have problems understanding plm? – Patrick Bormann Apr 03 '21 at 10:28
  • Thx for your efforts. I uploaded the second tab of the excel as a separate file as you can see, Appreciate your help. Upvoted. – Mohamed Rahouma Apr 04 '21 at 13:00
  • In your new data tab the time points are no longer repetitive. that would indicate a different method. Please post a dataset that shows indepedent variables and dependent variable and nothing else. – Patrick Bormann Apr 04 '21 at 21:57