0

I am trying to create a Weibull probability plot for censored data in R. To do so I need a log-log scale for the y axis and a log scale for the x axis.

The y axis is the probability (range 0 to 1) and the x axis is "time in days".

I know that I can create logarithmic axis with log="xy" in the plot() function. But I need a log-log scale for the y axis.

Is there a way to do that?

Thank you in advance!

Data example: data$

              X cens survCount   medianRank
       136.5424    1        10   0.09090909
       181.9756    1         9   0.18181818
       192.4309    1         8   0.27272727
       216.6145    1         7   0.36363636
       224.3097    0         6   NA
       254.4997    0         5   NA
       285.1438    1         4   0.49090909
       289.3991    1         3   0.61818182
       295.9161    0         2   NA
       309.9522    0         1   NA

X: times till failure
cens: binary, 0 if censored
survCount: number of things alive before failure/censoring
medianRanks: cumulated probability of failure

Explanation:
So X is what I want on the log x axis and the medianRanks are what I want on the log-log y axis.

The problem is that you can't calculate twice the logarithm from a number <1 becaue the first logarithm will give a negative number and you can't calculate a logarithm from a negative number. That is why I want to transform the axis and not the values.

What I did so far:
My workaround so far is to multiply my y values with a number (e.g. 1000) so that I don't have any values that are less than 1 and then calculate the log-log of these values. I then plot them, hide the axes and add new axes with the axis() function.

data$medianRank <- data$medianRank*1000  
loglogY <- log(log(data$medianRank))  
logX <- log(data$X)  

plot(logX, loglogY, yaxt="n", xaxt="n")  

ylabels <- c(0.1, 0.2, 0.5, 0.7, 0.99)
yAt <- log(log(ylabels*1000))
axis(2, at=yAt, labels=ylabels)
xlabels <- c(100, 200, 300, 400)
xAt <- log(xlabels)
axis(1, at=xAt, labels=xlabels)

@mike1886 suggested to use the ggplot2 package. I had a look at it and what I found is quite promising. When one creates a ggplot one can add coord_trans() to transform the axes. There are a few transformations available but I couldn't find a log-log one. Fortunately one can also write a custom transformation with the trans_new() function from the scales package.
My code so far for the new transformation:

require(ggplot2)
require(scales)
loglog_trans <- function(){
  trans <- function(x){ log(log(x)) }
  inv <- function(x){ exp(exp(x)) }
  trans_new("loglog", trans, inv)
}

wpp <- ggplot(data2, aes(ftime, medianRank)) + geom_point()
wpp
wpp + coord_trans("log10", "loglog")

But it is not working.

Error in if (zero_range(range)) { : missing value where TRUE/FALSE needed In addition: Warning message: In log(log(x)) : NaNs produced

elevendollar
  • 1,115
  • 10
  • 19
  • You should present some data and preliminary work. SO is not a do-my-project-for-me website. – IRTFM Jun 12 '14 at 14:36
  • Added example data, explanation and what I did so far. Can't post the resulting plot because my reputation is not high enough. – elevendollar Jun 12 '14 at 15:32

2 Answers2

1

You can try using ggplot2 (this is a very nice and complete plotting package) in R. For example, consider looking at the page: http://www.cookbook-r.com/Graphs/Axes_(ggplot2)/#axis-transformations-log-sqrt-etc

This will allow you to do whatever you would like to the axes. For example,

m <- qplot(rating, log10(votes), data=subset(movies, votes > 1000), na.rm = TRUE)
m + scale_y_log10() + scale_x_log10()
mike1886
  • 133
  • 4
1

I suspect that you are being expected to plot "complementary log -log" which probably means you are being asked to plot the log of the negative log. I admit that this is not exactly how such plots usually appear. What I usually see in texts regarding survival analysis is a rising trend and one should see roughly parallel lines (with positive slope) for log(-log(survival)) plotted against time when the proptional hazards assumption is met.

 dat <- read.table(text="             X cens survCoun
        136.5424    1        10   0.09090909
        181.9756    1         9   0.18181818
        192.4309    1         8   0.27272727
        216.6145    1         7   0.36363636
        224.3097    0         6   NA
        254.4997    0         5   NA
        285.1438    1         4   0.49090909
        289.3991    1         3   0.61818182
        295.9161    0         2   NA
        309.9522    0         1   NA", header=TRUE)

 with( dat, plot( log(X), log( - log(medianRank) ) ) )

enter image description here

So consider this where I am taking survCount/10 to be the proportion surviving at time= X:

png(); with( dat, plot( log(X), 
                        log( - log(survCount/max(survCount) ) ) 
                       ) )
dev.off()

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I want to plot the medianRanks on the y axis and the time till failure on the x axis. To check if the failure time is Weibull distributed I have to log-log scale the y axis and log scale the x axis. If the resulting scatterplot results in a rougly straight line (positive slope) it indicates that the data indeed follows a Weibull distribution with a specific shape and scale parameter. – elevendollar Jun 13 '14 at 07:10
  • I do not want log(-log). I also can't just take the proportion surviving at time= X because it's censored data, meaning the failure times for censored cases are not failure times. They only tell us that the observed unit has not failed yet and is still alive at that time. – elevendollar Jun 13 '14 at 07:15
  • It should look like this for the example data: [link](http://www11.pic-upload.de/12.06.14/4k24uzzw11l.png) – elevendollar Jun 13 '14 at 07:20