Data Visualization Needed for complex overlapping sets

Question

Good Day All,

I'd like to visualize my dataset but I am struggling even naming the type of visualization I need !

I want to look at the overlapping sets between a Reference Standard and three new tests.

The Reference Standard has a binary outcome (R and S).

Each of the three new tests can have more than two outcomes (R, S, Fail, Indeterminate)

So a portion of my data look like this (as an R data frame):

Subject <- c("11-0001","11-0002","11-0003","11-0004","11-0005","11-0007","11-0008","11-0010","11-0011","11-0012","11-0013","11-0014","11-0015","11-0016","11-0017","11-0018","11-0019","11-0020","11-0021","11-0022","11-0023","11-0025","11-0027","11-0029","11-0030","11-0035","11-0036","11-0037","11-0038","11-0039","11-0040","11-0041","11-0043","11-0044","11-0045","11-0046","11-0047","11-0048","11-0050","11-0052","11-0053","11-0054","11-0055","11-0056","11-0058","11-0059","11-0061","11-0062","11-0063","11-0064","11-0065","11-0066","11-0068","11-0069","11-0070","11-0071","11-0072","11-0074","11-0075")
ReferenceStandard <- c("R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","S","R","R","R","R","S","R","S","R","S")
TestA<- c("R","R","R","R","R","R","S","I","R","R","R","I","R","R","R","R","I","R","R","R","R","R","R","R","R","R","S","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","I","R","I","R","R","I","R","S","R","R","R","R","S","I","S","R","S")
TestB <- c("R","R","R","R","R","R","S","I","R","R","R","I","R","R","R","R","R","R","R","R","R","R","R","R","R","R","R","I","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","I","R","R","R","R","S","I","S","R","S")
TestC <-c("R","R","R","R","R","R","R","R","R","R","R","ND","R","R","R","R","R","R","R","R","R","R","R","R","R","R","S","S","R","R","R","R","R","R","R","R","R","R","R","R","R","S","R","R","S","R","R","R","R","S","R","R","R","R","S","ND","S","R","S")

mydata <- data.frame(subject=subject, ReferenceStandard=ReferenceStandard, TestA=TestA, TestB=TestB, TestC=TestC)

and so on (I have 1000 subjects) ...

So while the sensitivity/specificity all the individual tests against the Reference Standard are very similar, there are significant differences using Cochran's and McNemar's.

Right now, my hypothesis is that each test is failing differently. So TestA might fail on this set of subjects while TestB fails on a different set of subjects. In aggregate the numbers are similar enough so sensitivity/specificity are pretty similar but the paired sample statistical test highlight this is not the case. So I want to inspect this visually.

However, I am really stuck on what to even call this (because the new tests have four categories).

I have looked into Euler Diagrams but I do not believe that can support what I need.

I have thought that what I can do is make two sets of Euler Diagrams.

From the perspective of Reference=R. So overlap of Ref & TestA are only Rs and the non-overlap between Ref & TestA are the Reference=R and the TestA != R.
Repeating the above from the perspective of Reference=S.

I have also thought about an odd heatmap where the Y-axis is all 1000 subjects and the X-axis is ordered just like my data above but the four columns each color coded. Depending on how I sort the Y-axis, I can show off different aspects of the data. The problem is that it is really hard to pick out patterns with that kind of graphic.

Any other ideas ? Links to other visualizations would be really appreciated !

This really isn't the appropriate place for this question. It's about stats, data, and visualization not coding. It likely will be closed as it is off topic. — Tyler Rinker, May 22 '14 at 18:47
Well it is not do different than this question: http://stackoverflow.com/questions/11513149/good-ways-to-visualize-longitudinal-categorical-data-in-r?rq=1 and it lots of activity and favorites as well. — user918967, May 22 '14 at 18:49
This argument is akin to me stating to the State Trooper, those guys a mile back were speeding too. Let me point you to the question guides: http://stackoverflow.com/help/on-topic and http://stackoverflow.com/help/dont-ask — Tyler Rinker, May 22 '14 at 18:57
I think the question could become more focused if you included some code for an attempted visualization then sought feedback on your code. — Gary Weissman, May 22 '14 at 19:01
I would consider liberally using `ggplot::facet_grid` to plot histograms somewhat like in @GaryWeissman's answer. See my 2nd comment on his answer. — Gregor Thomas, May 22 '14 at 19:58
I edited my question to include code, please un-hold this question. — user918967, May 23 '14 at 03:22

Gary Weissman · Answer 1 · 2014-05-22T21:07:31.997

2

Here is an attempt to visualize your data set. Hard to know what to emphasize without the actual data, but here goes, with a sample to play with for other posters. Based on your post I'm trying to highlight differences in the distributions of test results by Ref.

library(reshape2)
library(ggplot2)

# make a data set

df <- data.frame(Subject=1:100, Ref = sample(c('R','S'),100,T), TestA = sample(c('R','F','S','I'),100,T), TestB = sample(c('R','F','S','I'),100,T), TestC = sample(c('R','F','S','I'),100,T) )

# melt into long

dfm <- melt(df, id=c('Subject','Ref'))

# and plot

ggplot(dfm, aes(x=variable, fill=value)) + geom_bar() + facet_wrap(~Ref)

# which gives

enter image description here

# or bars dodged rather than stacked

ggplot(dfm, aes(x=variable, fill=value)) + geom_bar(position='dodge') + facet_wrap(~Ref)

enter image description here

If what @shujaa says below is true, here is a similar themed image that highlights the true positive rate for each test by reference:

dfm <- transform(dfm, TP = value == Ref)

ggplot(dfm, aes(x=variable,fill=TP)) + geom_bar() + facet_wrap(~Ref)

enter image description here

Or following @shujaa 's last comment, here is one final attempt:

ggplot(dfm, aes(x=variable,fill=TP)) + geom_bar() + facet_wrap(value~Ref)

enter image description here

edited May 22 '14 at 21:07

answered May 22 '14 at 18:51

Gary Weissman

3,557
1
18
23

I think the OP is hoping for a visualization that would show, for each subject, whether or not each test was accurate. He says "In aggregate the numbers are similar enough so sensitivity/specificity are pretty similar..." – Gregor Thomas May 22 '14 at 19:46
1

Sorry, I think I'm not being clear. My interpretation of what the OP wants is a diagram that would let him see, in the cases where TestA is wrong, whether how TestB and TestC perform. And in the case where TestB fails and TestC is indeterminate, how does TestA perform, etc. I think the answer is `facet_grid` with a long formula and a lot of little histograms. – Gregor Thomas May 22 '14 at 19:56
@shujaa I think your assumption is reasonable, but without further explanation I'm not going to keep chasing this one. Feel free to take the data and play. Also histograms are typically used for continuous variables, so unless I misinterpeted the data set, I would go with bar plots here. Thanks for your input on this. – Gary Weissman May 22 '14 at 21:08
Agreed, I voted to close the question. Gave you a +1 though for sticking with it as far as you did! (And I have a terrible habit of mixing up the terms for histogram and barplot). I might actually ask open a new question, because I can specifically describe the plot *I* would want, but I can't think of how to make it elegantly. – Gregor Thomas May 22 '14 at 21:16
Hi All, it is possible that the first visualization may be able to be co-opted into something close. However, the key is that the data are tests from the same subject. Therefore I would like to visualize how often a subject that was actually Resistant was called Susceptible by all the tests (that is bad) vs. how often the reverse happens (the three new tests agree with the reference standard). – user918967 May 23 '14 at 02:38
I forgot to mention that I added some data to the question (as an R data frame) so I am hoping that this can be reopened. – user918967 May 23 '14 at 02:47

Data Visualization Needed for complex overlapping sets

1 Answers1