1

I have several measurements, which need to be presented in same boxplot chart, despite having completely different scales. Each group (=measurement type) has their own specific high and low acceptance limits.

The data should be normalized in R so that the low limit is always -1, and high limit always +1 across all groups. I'll then set the Y-axis so that all measurements are properly displayed.

So far I've managed to draw the boxplot with min(NUM_VALUE) being -1 and max(NUM_VALUE) being +1, but this is not the final result I want.

fake data (just part of the table):

ITEMID      NAME    SERIALID    NUM_VALUE   LOWER_LIMIT UPPER_LIMIT
Itemcode1   group1  SN1000      62.1        50          80
Itemcode1   group1  SN1001      62.6        50          80
Itemcode1   group1  SN1002      63.9        50          80
Itemcode1   group2  SN1006      1526.79     1526        1528
Itemcode1   group2  SN1007      1526.799    1526        1528
Itemcode1   group3  SN1015      1815.09     1814        1816
Itemcode1   group3  SN1016      1815.094    1814        1816
Itemcode1   group3  SN1017      1815.098    1814        1816
Itemcode1   group4  SN1025      1526.751    1526        1527
Itemcode1   group4  SN1026      1526.62     1526        1527
Itemcode1   group5  SN1028      1816.155    1816        1817
Itemcode1   group5  SN1029      1816.245    1816        1817

R code:

library(ggplot2)
library(data.table)
df <- read.table("data3.csv", header=TRUE, sep=";", stringsAsFactors=FALSE)
skl <- function(x){(x-min(x))/(max(x)-min(x))*2-1}
df <- transform(df,scaled=ave(df$NUM_VALUE,df$NAME,FUN=skl))
ggplot(df, aes(x=df$NAME, y = df$scaled)) + geom_boxplot()

Graph so far: boxplot

I'm very new to R.

Question: How to scale boxplot against UPPER_LIMIT and LOWER_LIMIT by group and present it all in same graph?

Any help highly appreciated, thank you!

GaryHill
  • 85
  • 10
  • 1
    Apply your `skl` function for each group and you should get what you want. There are many ways to do this, try searching SO on how to do that. – Roman Luštrik Nov 02 '17 at 08:12
  • 1
    A general remark/question to your approach. When setting upper/lower limits to 1/-1 information about the different variance in the groups is lost. Showing variance is one of the mains strengths of a boxplot. Maybe I just do not understand your case correctly, could you explain what`s the purpose of your approach? From my perspective it would rather make sense to substract the median value from all values of each group, so that all median values are displayed on the zero line of the x-axis in order to analyze the different variances in the groups. – Manuel Bickel Nov 02 '17 at 08:41
  • Manuel - I'm not an expert in statistics, but I'll try to explain. Setting the limits to 1/-1 is commonly used e.g. in Q-stat statistical software. In my case this particular view is used to have quick overview on all characteristics of one selected item, and how the results are cumulated within allowed limits. Of course other boxplot views as well as some other analysis may be used to further pinpoint any specific issues. – GaryHill Nov 02 '17 at 08:58

1 Answers1

2

Instead of using min() and max(), you can change your function skl() to also take lower and upper bounds that are used instead.

The adapted function looks like this:

skl <- function(x, lower, upper){
  (x - lower)/(upper - lower) * 2 - 1
}

You can than go through the rows of your data.frame using apply():

df$scaled <- apply(df[, 4:6], 1, function(row) {
  skl(x = row[1], lower = row[2], upper = row[3])
})

The result looks like this:

df$scaled
 [1] -0.19333333 -0.16000000 -0.07333333 -0.21000000 -0.20100000  0.09000000  0.09400000  0.09800000
 [9]  0.50200000  0.24000000 -0.69000000 -0.51000000

Using your code, the boxplot will look like this:

library(ggplot2)
ggplot(df, aes(x=df$NAME, y = df$scaled)) + geom_boxplot()

boxplot

Felipe Augusto
  • 7,733
  • 10
  • 39
  • 73
clemens
  • 6,653
  • 2
  • 19
  • 31