0

I have a dataset that looks like this, I am trying to perform the Kruskal Wallis test on it

enter image description here

The r code for it is as follows:

my_data <- read.csv('NBvariants_KWtest.csv',header = TRUE)
head(my_data)
levels(my_data$NaiveBayesvariant)
my_data$NaiveBayesvariant <- ordered(my_data$NaiveBayesvariant,
                         levels = c("I", "II", "III","IV","V","VI"))
library(dplyr)
group_by(my_data, NaiveBayesvariant) %>%
  summarise(
    count = n(),
    mean = mean(accuracy, na.rm = TRUE),
    sd = sd(accuracy, na.rm = TRUE),
    median = median(accuracy, na.rm = TRUE),
    IQR = IQR(accuracy, na.rm = TRUE)
  )

library("ggpubr")
ggboxplot(my_data, x = "NaiveBayesvariant", y = "Accuracy", 
          color = "NaiveBayesvariant", palette = c("#00AFBB", "#E7B800", "#FC4E07","#00AFBB", "#E7B800", "#FC4E07"),
          order = c("I", "II", "III","IV","V","VI"),
          ylab = "Accuracy", xlab = "Naive Bayes variant")

ggline(my_data, x = "NaiveBayesvariant", y = "Accuracy", 
       add = c("mean_se", "jitter"), 
       order = c("I", "II", "III","IV", "V", "VI"),
       ylab = "Naive Bayes variant", xlab = "Accuracy")


kruskal.test(accuracy ~ NaiveBayesvariant, data = my_data)

However, I am getting this error:

> kruskal.test(accuracy ~ NaiveBayesvariant, data = my_data)
Error in model.frame.default(formula = accuracy ~ NaiveBayesvariant, data = my_data) : 
  variable lengths differ (found for 'NaiveBayesvariant')
> kruskal.test(accuracy ~ NaiveBayesvariant, data = my_data
IronMaiden
  • 552
  • 4
  • 20
  • 2
    Please include reproducible and copy&paste-able sample data (using e.g. `dput`). Screenshots of data/code are never a good idea as we can't extract information from an image. Also please don't post code that is not relevant to your issue. For example, the plotting seems to have nothing to do with your issue. Ditto for the summarisation. – Maurits Evers Mar 26 '19 at 01:35
  • 2
    I would add that you should post only a minimum, complete, and verifiable example ([MCVE](https://stackoverflow.com/help/mcve)). No need to share your ggboxplot code or other irrelevant calculations. – DanY Mar 26 '19 at 02:01
  • 1
    A KW test is usually a two sample test and you don't seem to ahve that setup. Can you clearly state the hypothesis you are hoping to test? (I suspect not.) So it may not be helpful to post data in text form, since it's not clear you have a firm grasp on statistical issues. Other issue might be `"Accuracy" != "accuracy"` – IRTFM Mar 26 '19 at 02:24

0 Answers0