1

I have this table as an output which I would like to save as a csv file

value<-cbind(c(rnorm(100,500,90),rnorm(100,800,120)))
genotype<-cbind(c(rep("A",100),rep("B",100)))
gender<-rep(c("M","F","F","F"),50)
df<-cbind(value,genotype,gender)
df<-as.data.frame(df)
colnames(df)<-c("value","genotype","gender")
df$value<-as.numeric(as.character(df$value))
library(Publish)
summary(univariateTable(gender ~ Q(value) + genotype, data=df))

Output of the above command is this:

  Variable        Level   gender = F (n=150)    gender = M (n=50)
1    value median [iqr] 651.4 [499.2, 781.0] 636.8 [539.8, 824.2]
2 genotype            A            75 (50.0)            25 (50.0)
3                     B            75 (50.0)            25 (50.0)
         Total (n=200) p-value
1 644.6 [509.8, 787.5]  0.4859
2           100 (50.0)        
3           100 (50.0)  1.0000

For exporting the above table as a .csv file, I use this code

write.csv(summary(univariateTable(gender ~ Q(value) + genotype, data=df)),file="file.csv")

The problem is that comma-separated upper values of the interquartile ranges: , 781.0, , 824.2, and , 787.5 end up in separate columns, so the row #1 ends up hawing three extra columns and the table is inaccurate.

Is there any way to avoid that?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Oposum
  • 1,155
  • 3
  • 22
  • 38
  • 1
    Can you save it as a tab separated file using something like `write.table` and set `sep = "\t"` ? – steveb Jan 17 '16 at 18:21
  • I want the `.csv` to be easily accessible by my colleagues operating MS Excel, so I'm not sure whether this would work on excel. – Oposum Jan 17 '16 at 19:05
  • Excel can open tab separated files but you would not want to use the `.csv` extension, you would want to do something like use `.txt`. Excel will likely ask them what they want to use as a separator (not optimal). Did you also try using the `quote=TRUE` option, I was able to create a comma separated file with commas as part of one of the columns ? – steveb Jan 17 '16 at 19:17
  • @steveb I tested and it works on LibreOffice, but MS Excel does not open it. – Oposum Jan 17 '16 at 19:17
  • @steveb well, `quote=TRUE` did the trick. It works now. Thanks! – Oposum Jan 17 '16 at 19:21
  • FYI, if you do use tabs at some point for any reason, you may want to use the option `row.names = FALSE` as that outputs the row names (i.e. the number of columns and headers differ by 1); that didn't seem to happen when I used `write.csv`. – steveb Jan 17 '16 at 19:28
  • @steveb sounds good. I really like `quote=TRUE` and thinking using it all the time, is there any disadvantage in using it? Why would it be `quote=F` by default? – Oposum Jan 17 '16 at 19:30
  • I use `quote = FALSE` in tab separated files all the time as the file can be easier to work with on the Linux command line. No strong opinion, it just depends on the use case. If we need to talk more on this SO is suggesting we move this to a "chat". – steveb Jan 17 '16 at 19:33
  • @steveb no further questions here but maybe you can take a look at my other question that is also about writing tables http://stackoverflow.com/questions/34842837/saving-output-of-confusionmatric-as-a-csv-table – Oposum Jan 17 '16 at 19:46

1 Answers1

1

Try using the quote=TRUE option. Excel should be able to distinguish between , used to delimit columns vs. part of the data.

Here is the command with the one additional option.

write.csv(
    summary(univariateTable(gender ~ Q(value) + genotype, data=df)),
    file="file.csv", quote = TRUE)
steveb
  • 5,382
  • 2
  • 27
  • 36