1

I have a potentially very stupid question, but can't seem to find a solution easily. And i'm pretty new to R, so please forgive my ignorance.

I'm looking for a way to loop through all variables in my dataframe. For instance, to make two-way tables of all variables compared to one specific variable (say, Sex or Educational level). I used to work with Stata, but since R is free, I am now supposed to work with R (I heard there are a plethora of other benefits to working with R as well, so I am very willing to learn :)).

Say, I have 20 variables, of which 15 are answers from a survey and 5 are demographic variables. I would like to see how different answers compare to differences in demographics.

Normally I would tackle the problem above in Stata with something simple as:

for i = 1 to 5 {
    for j = 1 to 3 {
        tab Sex Var`i'_`j', chi2
    }
}

making 15 tables, for the variables Var1_1 to Var5_3 vs Sex, and giving a Pearson chi2 statistic.

So, I tried what I thought was the same for R:

for (i in 1:5) {
  for (j in 1:3){
  print(table(chisq.test(paste(df$Sex, "df$Var",i,"_",j,sep=""))))    
  }
}

but this doesn't work.

Can anyone please point me in the right direction as how to solve this? Any help is highly appreciated!

Eelco
  • 25
  • 5
  • You can use `summary(df)` or `lapply(df, table)`, where the first will give you a summary of the data.frame where numerical variables are summarized with min, max, mean, median and categorical (factor) variables with a table. The second gives you a list of tables of your variables. – kath Oct 02 '19 at 08:59
  • 2
    You really need to study `help("$)`. It explains when you can use `$` and when to use `[]` and `[[]]` instead. In general, approaches that work well in one language do not necessarily transfer well to another language. This is such a case. – Roland Oct 02 '19 at 09:22
  • Thanks, I'll read up on that and try again. I also edited my question a bit since my example seems poorly chosen (considering how the first comment answers how to achieve similar results via another way) – Eelco Oct 02 '19 at 09:40

1 Answers1

1

Let's pretend that df is your data and first 15 columns are answers. In this case you can use this

lapply(df[,1:15], function(x) {chisq.test(x, df$Sex)}) 
Yuriy Barvinchenko
  • 1,465
  • 1
  • 12
  • 17