How to divide a data frame into new data frames(like new data1,data2,data3 ..so on), so I can anaylsis each of them(like T-test)

Question

I am just start learning R for data analysis. Here is my problem.

I want to analyse the body weight(BW) difference between male and female in different species. (For example, in Sorex gracilliums, male and female body weight is significantly different just an example,I don't know the answer. :))At first I thought maybe I can first divide them by Species into several groups.(This indeed can be done in Excel, but I have tooo many files, I think maybe R is better ) And then I can just using some simple code to test sex difference. But I don't know how to divide them, how to make new data frame.. I tried to use group_split. It indeed split the data, but just many tribble. like image showed

What should I do? Or maybe there is a better way for testing the difference?

I am a foreigner,so maybe there are many grammar mistakes.. But I will be very appreciated if you help!

You _should_ have offered a [MCVE] but the short answer is in ?split. — IRTFM, Jan 20 '20 at 09:14
@42- I am sorry, I should have offered more information about my question. This was my first time using Stack Overflow, I will learn more about the rule of asking a question. Thnak you for reminding! — LiLily, Jan 21 '20 at 02:54
You should be looking at `?t.test` if you want to test for differences between two groups and at `?anova` or `?lm` if you want to test for difference among multiple groups. — IRTFM, Jan 21 '20 at 05:57

StupidWolf · Accepted Answer · 2020-01-21T07:37:02.660

Assuming your data is in a data.frame called df, with columns NO, SPECIES, SEX, BW:

set.seed(100)
df = data.frame(NO=1:100,
SPECIES=sample(LETTERS[1:4],100,replace=TRUE),
SEX=sample(c("M","F"),100,replace=TRUE),
BW = rnorm(100,80,2)
)

And we make Species D to have an effect:

df$BW[df$SPECIES=="D" & df$SEX=="M"] = df$BW[df$SPECIES=="D" & df$SEX=="M"] + 5

If we want to do it on one data frame, say Species A, we do

dat = subset(df,SPECIES=="A")
t.test(BW ~ SEX,data=dat)

And you get the relevant statistics and so forth. To do this systematically for all SPECIES, we can use broom, dplyr:

library(dplyr)
library(broom)

df %>% group_by(SPECIES) %>% do(tidy(t.test(BW ~ SEX,data=.)))

# A tibble: 4 x 11
# Groups:   SPECIES [4]
  SPECIES estimate estimate1 estimate2 statistic p.value parameter conf.low
  <fct>      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>
1 A          0.883      80.4      79.6     0.936 3.65e-1      14.2   -1.14 
2 B          0.259      80.2      79.9     0.377 7.12e-1      14.1   -1.21 
3 C          0.170      80.1      79.9     0.359 7.23e-1      25.3   -0.807
4 D         -5.55       79.7      85.2    -7.71  1.29e-7      21.4   -7.05

If you don't want to install any packages, this will give you all the test results:

by(df, df$SPECIES, function(x)t.test(BW ~ SEX,data=x))

And combining them into one data.frame:

func = function(x){ 
Nu=t.test(BW ~ SEX,data=x);
data.frame(estimate_1=Nu$estimate[1],estimate_2=Nu$estimate[2],p=Nu$p.value)} 
do.call(rbind,by(df, df$SPECIES,func))

That's a cool solution. Have been thinking hard about a base R solution involving `by`. Would that be possible to use too? If so, how? What I've come up with `by(df, df$SPECIES, function(x) tapply(df$BW, df$SEX, t.test))` only computes one-sample t-tests. — Chris Ruehlemann, Jan 20 '20 at 10:53
@ChrisRuehlemann, you don't need the tapply, just do by(df, df$SPECIES, function(x)t.test(BW ~ SEX,data=x)). You might need something to combine them into a data.frame — StupidWolf, Jan 20 '20 at 13:18
so for example, func = function(x){ Nu=t.test(BW ~ SEX,data=x);data.frame(estimate=Nu$estimate,p=Nu$p.value)} ; do.call(rbind,by(df, df$SPECIES,func)) — StupidWolf, Jan 20 '20 at 13:20
Thank you. Why not include this solution in an edit to your answer? — Chris Ruehlemann, Jan 20 '20 at 14:01
To be honest, as a novice I still need time to understand your code...But I am very appreciated you provided many ways! I learnt a lot! — LiLily, Jan 21 '20 at 03:44

score 0 · Answer 2 · answered Jan 20 '20 at 09:37

Here is an example to set multiple data.frames from one. The exemple data set iris is a table of character for 3 species.

First you can set a vector with all the species in your dataframe nspe. I then create a liste of the same length.

The for loop allows to watch each element of this list et put it a data.frame with just the species.

At the end of this script, I compute the mean petal width of the setosa species. If I had two discrete character on this species, I could do a t.test as well. I did one here but it's not really usefull...

data("iris")
summary(iris)

nspe <- as.vector(unique(iris$Species))

spe <- list() ; length(spe) = length(nspe) ; names(spe) <- nspe

for(i in nspe){
  spe[i][[1]] <- iris[which(iris$Species == i),]
}

mean(spe$setosa$Petal.Width)
# [1] 0.246
t.test(spe$setosa$Petal.Width)

Below is an example to show how you can run a t.test on one species. Note that you will surely have trouble with species names and spaces, so I think it's easier to set ID for species than keeping their full names.

In future questions, consider providing a small example dataset rather than pictures, it's easier to help you.

# NOT RUN
t.test(
  spe$Sorex_gracilliums$BW[which(spe$Sorex_gracilliums$SEX == 'm')],
  spe$Sorex_gracilliums$BW[which(spe$Sorex_gracilliums$SEX == 'f')]
)

How to divide a data frame into new data frames(like new data1,data2,data3 ..so on), so I can anaylsis each of them(like T-test)

2 Answers2