0

I want to run two t-tests on some data I have on crab and lobster landed weight in North and South Wales, one separate test for each species at the moment. I have log-transformed both weight columns as both had lots of very low values. I run the following code on both species:

t.test(data = crabs, logweight~Region)
t.test(data = lobsters, logweight~Region)

For crabs the t-test works fine and I get an output in the console, however for the lobster data I get the following error message:

Error in if (stderr < 10 * .Machine$double.eps * max(abs(mx), abs(my))) stop("data are essentially constant") : 
  missing value where TRUE/FALSE needed

This seems to be an error message that happens when you try to use non-numeric data. The data for weight is definitely numeric and I have even tried converting Region into numeric values of 1 and 2 instead of North and South but I am still getting this error message. If I run the t-test on the untransformed data it works fine, so the issue appears to be with the log-transformed lobster weight data. What is the problem here and how can I fix it?

This is what the raw data with the logweight column added looks like enter image description here

Some example data:

structure(list(Weight = c(130, 10, 25, 45, 21, 75, 100, 9.6, 12.9, 17.1, 11, 11, 28, 8, 50, 30, 9.5, 28.5, 91, 16), Region = c("NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L))
deschen
  • 10,012
  • 3
  • 27
  • 50
Kiran B
  • 11
  • 2
  • 4
    Share your data. – geometricfreedom Feb 24 '22 at 13:02
  • 3
    There's really nothing we can do to help unless we see a sample of your data that reproduces the problem. – Allan Cameron Feb 24 '22 at 13:07
  • 1
    Without seeing the data and based on the fact that you are using log scales, it might be that your values are all very,very similar (e.g. around 0 or 1). In that case, there‘s almost no variance/sd/stderr in your data which means claculating a t value is not possible/doens‘t make sense. – deschen Feb 24 '22 at 13:17
  • I will also add that for some reason the mean of loglobsters is reading as -Inf so something is definitely not right there! – Kiran B Feb 24 '22 at 14:49
  • Your data screenshot unfortunately doesn‘t show the values foe your logcrabs/loglobsters variables. – deschen Feb 24 '22 at 14:50
  • Can you do the following please: `library(tidyverse); crabs %>% select(Region, logcrabs) %>% group_by(Region) %>% slice_sample(n=20) %>% ungroup() %>% dput()` and share this output with us? – deschen Feb 24 '22 at 14:55
  • It says: Error: Must subset columns with a valid subscript vector. x Can't convert from to due to loss of precision. – Kiran B Feb 24 '22 at 15:34
  • `crabs` is the name of your data frame and you do have a column called `logcrabs` in this data frame as well as a variable `Region`? ( the latter one I can see in your screenshot, but not the logcrabs column) – deschen Feb 24 '22 at 15:51
  • The data logcrabs and loglobsters are just vectors, how do I add them as a column for the two separate datasets? – Kiran B Feb 24 '22 at 15:59
  • Oh my, this is quite an important information. So I suggest you tidy up your question a bit, i.e. the hiostograms for the weight variable are irrelevant. Assuming the order or your logcrabs vector is the same as in your data set please share: `library(tidyverse); crabs %>% select(Region) add_column(logcrabs) %>% group_by(Region) %>% slice_head(n = 10) %>% ungroup() %>% dput()`. – deschen Feb 24 '22 at 16:07
  • In addition, what confuses me is that in your questions you are talkiing about your Weight variable multiple times, yet in your t.test code you don't care about Weight. So what is important about your Weight column? – deschen Feb 24 '22 at 16:09
  • It's saying Error in group_by(Region) : object 'Region' not found I should probably explain that in the t-tests the variables logcrab and loglobster are the log-transformed weights of the crab and lobster, it's just how I have named them. I can perform t-tests on the untransformed data fine, it's only when they are log-transformed the lobster t-test does not work. – Kiran B Feb 24 '22 at 16:35
  • I'm at my wits' end, to be honest. The data/screenshots you've shared obviously are not what your actual data looks like, e.g. you've shared a screenshot of cour raw data (I assume this is called "crabs") showing that it has a column "Region", but the code above and the error indicate that there is no such column. So unless you are able to provide a concise sample/example of your data, I won't be able to help you. The t-test you are doing shows that you are only interested in understanding the mean differences of `logcrabs` by `Region`. So you should have either a data set containing both... – deschen Feb 24 '22 at 16:39
  • ...of these columns or at least two vectors, one for Region, one for logcrabs. – deschen Feb 24 '22 at 16:39
  • Can you share `library(tidyverse); crabs %>% select(Weight, Region) %>% group_by(Region) %>% slice_head(n = 10) %>% ungroup() %>% dput()`. Please replace `crabs` with whatever your data name actually is. Note, for this to work you have to have the package `tidyverse` installed. If that's not the case, please install with `install.packages("tidyverse")`. – deschen Feb 24 '22 at 16:41
  • In addition, what does `min(crabs$logweight)` and `max(crabs$logweight)` return? – deschen Feb 24 '22 at 16:47
  • Apologies if this is not very clear, I'm still getting back into R after over a year away from it. I've reworded my original question and now added logweight as a separate column. The output from that code is: structure(list(Weight = c(130, 10, 25, 45, 21, 75, 100, 9.6, 12.9, 17.1, 11, 11, 28, 8, 50, 30, 9.5, 28.5, 91, 16), Region = c("NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "NORTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH", "SOUTH")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L)) – Kiran B Feb 24 '22 at 16:49
  • Sounds like you have some 0s in your unlogged data. These become `-Inf` after you log them. So the mean of your logged data is negative infinity, the variance is infinite, and t-tests are not valid. Clean up your data first, omitting the 0s. – Gregor Thomas Feb 24 '22 at 16:53

1 Answers1

0

Eureka I think I've solved it! For some reason some of the values when log-transformed were negative. I changed the transformation to log+1 and now all values are positive. I ran the t-test again and this time it worked. Both t-tests are showing a highly significant difference in weight between north and south and that is what I expected from looking at the boxplots of the two datasets. Thank you for your assistance everyone that helped.

Kiran B
  • 11
  • 2
  • Glad you've solved it. It shows that you should always inspect your input data, especially if you do some data transformations, before doing a statistical test! – deschen Feb 24 '22 at 16:57
  • `log(x)` will be negative for any `x < 1`. That shouldn't be an issue as long as `x > 0`. Though even getting too close to 0 can be destabilizing for mean and variance calculations. – Gregor Thomas Feb 24 '22 at 16:57
  • That's what the problem was, there were about 250 values that were less than 1 so it was generating negative values, and t-tests won't work with negative values I believe. – Kiran B Feb 24 '22 at 17:07
  • No, t test works fine with negative values. You can still calculate a mean from negative values. The problem is probably as Gregor Thomas describes. If you have values where Weight == 0, you will run into the error. Can you check if you had any 0 values in Weight? – deschen Feb 24 '22 at 17:15