12

I'm wondering if it's possible to create a xtable from the command str(x) to get an overview from the variables you use. This would be a nice feature to introduce someone to the dataset, but it's annoying to create it by yourself. So whta I tried is to make a xtable like this:

str(cars)
require(xtable)
xtable(str(cars))

the cars dataset is given from R. Unfortunately xtable doesn't give a Latexcode for str(). Is it possible outsmart R here? Here are the main commands that xtable will understand:

methods(xtable)

Any ideas?

user734124
  • 489
  • 8
  • 20

4 Answers4

17

Another package to look at is reporttools. Here is an short piece of code to illustrate its usage on the tips dataset from reshape package. Both the summary statements produce latex code which can be copy pasted into a document, or used for weaving.

library(reporttools)
data(tips, package = 'reshape')

# summarize numeric variables
tableContinuous(tips[,sapply(tips, is.numeric)])

# summarize non-numeric variables
tableNominal(tips[,!sapply(tips, is.numeric)])

EDIT. If you really MUST use str, then here is one way to go about it

str_cars = capture.output(str(cars))
xtable(data.frame(str_cars))

OUTPUT FROM REPORTTOOLS:enter image description hereenter image description here

Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • 1
    Hello Ramnath! This is pretty close to what I need. Thanks so far! What I realy would like to have, is to merge the table for continous values and the categorical values! But what I don't need is the descriptive statistics... What I need is close to the str() command in basic R, but in a good looking way for Latex. Maybe this is not possible and I've to create a manuel table... I wonder why there is no simple command, because it's often neccessary to introduce variables, to make the reader familiar with what you're dealing with :) – user734124 May 06 '11 at 07:34
14

If you're willing to spend some time investigating how the Hmisc package works, you will soon discover that there are many utilities that facilitate such tasks. In particular, the contents() method facilitates the description of data.frame by reporting

names, labels (if any), units (if any), number of factor levels (if any), factor levels, class, storage mode, and number of NAs

Labels and units can be binded (internally, as attributes) to each variable. There are associated print, html and latex methods for viewing and exporting.

Another nice functionality is the describe() function, as seen below:

> describe(cars)
cars 

 2  Variables      50  Observations
--------------------------------------------------------------------------------
speed 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      19    15.4     7.0     8.9    12.0    15.0    19.0    23.1 
    .95 
   24.0 

          4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25
Frequency 2 2 1 1  3  2  4  4  4  3  2  3  4  3  5  1  1  4  1
%         4 4 2 2  6  4  8  8  8  6  4  6  8  6 10  2  2  8  2
--------------------------------------------------------------------------------
dist 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
     50       0      35   42.98   10.00   15.80   26.00   36.00   56.00   80.40 
    .95 
  88.85 

lowest :   2   4  10  14  16, highest:  84  85  92  93 120 
--------------------------------------------------------------------------------
chl
  • 27,771
  • 5
  • 51
  • 71
  • Hey chl! Very interesting Package. I tried the commands you presented and it seems to be worthy to spend some time learning to applicate! Just real quick: I tried to use the 'Latex(str(x))' command (I know this was just a noobish attempt) but it seems to be a different way to compute the Latex code (compared to the xtable command). I just need a short overview for variables as it is provided with str(). contents() and describe() are much better than str() to get some information about the data, but what I need to have is a short summarize! – user734124 May 05 '11 at 17:33
10

Since xtable provides best result when used with data.frames and matrix objects, I'd recommend something like this:

library(xtable)
library(plyr)
dtf <- sapply(mtcars, each(min, max, mean, sd, var, median, IQR))
xtable(dtf)
% latex table generated in R 2.12.2 by xtable 1.5-6 package                                                                  
% Thu May  5 19:40:08 2011                                                                                                   
\begin{table}[ht]                                                                                                            
\begin{center}                                                                                                               
\begin{tabular}{rrrrrrrrrrrr}                                                                                                
  \hline                                                                                                                     
 & mpg & cyl & disp & hp & drat & wt & qsec & vs & am & gear & carb \\                                                       
  \hline                                                                                                                     
min & 10.40 & 4.00 & 71.10 & 52.00 & 2.76 & 1.51 & 14.50 & 0.00 & 0.00 & 3.00 & 1.00 \\                                      
  max & 33.90 & 8.00 & 472.00 & 335.00 & 4.93 & 5.42 & 22.90 & 1.00 & 1.00 & 5.00 & 8.00 \\                                  
  mean & 20.09 & 6.19 & 230.72 & 146.69 & 3.60 & 3.22 & 17.85 & 0.44 & 0.41 & 3.69 & 2.81 \\                                 
  sd & 6.03 & 1.79 & 123.94 & 68.56 & 0.53 & 0.98 & 1.79 & 0.50 & 0.50 & 0.74 & 1.62 \\                                      
  var & 36.32 & 3.19 & 15360.80 & 4700.87 & 0.29 & 0.96 & 3.19 & 0.25 & 0.25 & 0.54 & 2.61 \\                                
  median & 19.20 & 6.00 & 196.30 & 123.00 & 3.70 & 3.33 & 17.71 & 0.00 & 0.00 & 4.00 & 2.00 \\                               
  IQR & 7.38 & 4.00 & 205.18 & 83.50 & 0.84 & 1.03 & 2.01 & 1.00 & 1.00 & 1.00 & 2.00 \\                                     
   \hline                                                                                                                    
\end{tabular}                                                                                                                
\end{center}                                                                                                                 
\end{table} 

Sorry for lengthy output. You can grab PDF here. each is a very versatile function, since you can define custom summary quite easy. Besides, str returns output to stdout, so you can't retrieve summary for specific variables. In this case, sapply will simplify the result, yielding matrix instead data.frame. But that's not so problematic, right?

aL3xa
  • 35,415
  • 18
  • 79
  • 112
  • Thank you for your efforts! This is really helpful to me structuring xtable commands in the future! But I just need The Variables in a column and the levels, as it shown in the figure above! No need for any descriptive statistics :) – user734124 May 06 '11 at 06:36
  • Well... that's something completely different. You should've said so. Could you rephrase your question, so we could tackle it in a different manner? Be sure to provide some dummy data. – aL3xa May 06 '11 at 08:54
0

You may give a look also,

library(magrittr)
    library(qwraps2)

    mtcars2 <-
      dplyr::mutate(mtcars,
                    cyl_factor = factor(cyl,
                                        levels = c(6, 4, 8),
                                        labels = paste(c(6, 4, 8), "cylinders")),
                    cyl_character = paste(cyl, "cylinders"))

    our_summary1 <-
      list("Miles Per Gallon" =
             list("min" = ~ min(.data$mpg),
                  "max" = ~ max(.data$mpg),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$mpg)),
           "Displacement" =
             list("min" = ~ min(.data$disp),
                  "median" = ~ median(.data$disp),
                  "max" = ~ max(.data$disp),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$disp)),
           "Weight (1000 lbs)" =
             list("min" = ~ min(.data$wt),
                  "max" = ~ max(.data$wt),
                  "mean (sd)" = ~ qwraps2::mean_sd(.data$wt)),
           "Forward Gears" =
             list("Three" = ~ qwraps2::n_perc0(.data$gear == 3),
                  "Four"  = ~ qwraps2::n_perc0(.data$gear == 4),
                  "Five"  = ~ qwraps2::n_perc0(.data$gear == 5))
      )

    by_cyl <- summary_table(dplyr::group_by(mtcars2, cyl_factor), our_summary1)
    xtable(by_cyl)
Seyma Kalay
  • 2,037
  • 10
  • 22