3
>dput(data)
structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 
3, 3), Dx = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1), Month = c(0, 
6, 12, 18, 24, 0, 6, 12, 18, 24, 0, 6, 12, 18, 24), score = c(0, 
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0)), .Names = c("ID", 
"Dx", "Month", "score"), row.names = c(NA, -15L), class = "data.frame")

>data
    ID Dx Month score
1   1  1     0     0
2   1  1     6     0
3   1  1    12     0
4   1  1    18     1
5   1  1    24     1
6   2  1     0     1
7   2  1     6     1
8   2  2    12     1
9   2  2    18     0
10  2  2    24     1
11  3  1     0     0
12  3  1     6     0
13  3  1    12     0
14  3  1    18     0
15  3  1    24     0

Suppose I have the above data.frame. I have 3 patients (ID = 1, 2 or 3). Dx is the diagnosis (Dx = 1 is normal, = 2 is diseased). There is a month variable. And last but not least, is a test score variable. The participants' test score is binary, and it can change from 0 or 1 or revert back from 1 to 0. I am having trouble coming up with a way to visualize this data. I would like an informative graph that looks at:

  1. The trend of the participants' test scores over time.
  2. How that trend compares to the participants' diagnosis over time

In my real dataset I have over 800 participants, so I do not want to construct 800 separate graphs ... I think the test score variable being binary really has me stumped. Any help would be appreciated.

Jaap
  • 81,064
  • 34
  • 182
  • 193
Adrian
  • 9,229
  • 24
  • 74
  • 132
  • 2
    Having 800 trends in one graph would be messy, can't you aggregate them or something? – Soheil May 04 '15 at 08:41
  • Patient score over time can be tracked in a Shewhart chart, see package qcc. You can choose from EWMA, CUSUM or a Shewhart that is particular to your situation, e.g. a C chart [month count] or a U chart [monthly rates]. – Henk May 04 '15 at 08:52

2 Answers2

4

With ggplot2 you can make faceted plots with subplots for each patient (see my solution for dealing with the large number of plots below). An example visualization:

library(ggplot2)
ggplot(data, aes(x=Month, y=score, color=factor(Dx))) +
  geom_point(size=5) +
  scale_x_continuous(breaks=c(0,6,12,18,24)) +
  scale_color_discrete("Diagnosis",labels=c("normal","diseased")) +
  facet_grid(.~ID) +
  theme_bw()

which gives:

enter image description here


Including 800 patients in one plot might be a bit too much as already mentioned in the comments of the question. There are several solutions to this problem:

  1. Aggregate the data.
  2. Create patient subgroups and make a plot for each subgroup.
  3. Filter out all the patients who have never been ill.

With regard to the last suggestion, you can do that with the following code (which I adapted from an answer to one of my own questions):

deleteable <- with(data, ave(Dx, ID, FUN=function(x) all(x==1)))
data2 <- data[deleteable==0,]

You can use this as well for creating a new variable identifying patient who have been ill:

data$neverill <- with(data, ave(Dx, ID, FUN=function(x) all(x==1)))

Then you can for example aggregate the data with the several grouping variables (e.g. Month, neverill).

Community
  • 1
  • 1
Jaap
  • 81,064
  • 34
  • 182
  • 193
1

Note: A lot of the following data manipulation needs to be done for part 2. Part 1 is less complex, and you can see it fit in below.

Uses

library(data.table)
library(ggplot2)
library(reshape2)

To Compare

First, change the Dx from 1 to 2 to 0 to 1 (Assuming that a 0 in score corresponds to a 1 in Dx)

data$Dx <- data$Dx - 1

Now, create a matrix that returns a 1 for a 1 diagnosis with a 0 test, and a -1 for a 1 test with a 0 diagnosis.

compare <- matrix(c(0,1,-1,0),ncol = 2,dimnames = list(c(0,1),c(0,1)))
> compare
  0  1
0 0 -1
1 1  0

Now, lets score every event. This simply looks up the matrix above for every entry in your matrix:

data$calc <- diag(compare[as.character(data$Dx),as.character(data$score)])

*Note: This can be sped up for large matrices using matching, but it is a quick fix for smaller sets like yours

To allow us to use data.table aggregation:

data <- data.table(data)

Now we need to create our variables:

tograph <- melt(data[, list(ScoreTrend = sum(score)/.N, 
                            Type = sum(calc)/length(calc[calc != 0]), 
                            Measure = sum(abs(calc))), 
                     by = Month],
                id.vars = c("Month"))
  • ScoreTrend: This calculates the proportion of positive scores in each month. Shows the trend of scores over time
  • Type: Shows the proportion of -1 vs 1 over time. If this returns -1, all events were score = 1, diag = 0. If it returns 1, all events were diag = 1, score = 0. A zero would mean a balance between the two
  • Measure: The raw number of incorrect events.

We melt this data frame along month so that we can create a facet graph.

If there are no incorrect events, we will get a NaN for Type. To set this to 0:

tograph[value == NaN, value := 0]

Finally, we can plot

ggplot(tograph, aes(x = Month, y = value)) + geom_line() + facet_wrap(~variable, ncol = 1)

We can now see, in one plot:

  • The number of positive scores by month
  • The proportion of under vs. over diagnosis
  • The number of incorrect diagnoses.
Chris
  • 6,302
  • 1
  • 27
  • 54