I am getting weird outputs from my datasummary code. The idea is to create a table that shows the mean and SD for numeric variables and the number of observations for the full sample. I also want to display the shares for the two levels of a binary factor variable. Currently, i get the SD and mean from the only numeric variable (which makes sense), and the N shown is also only shown for the numeric variable. The N shown is also not the number of observations, but the first number in the numeric variable vector. This is my current code
age is the numeric variable v2 - v4 are factor variables obama is a factor variable which i want the table to show shares per each of the 2 levels.
datasummary(formula = age + (
educated parent= education) + religion + sex ~ Heading("Entire sample") * 1 * (Mean + SD + N) + obama * Percent(), fmt = 3, data = data, title = 'Table 1: Votes for Obama in 2012 - Summary statistics', notes = c('1 = voted for Obama', 'educated parent: 1 = at least one parent has a degree', 'Source: General social survey'))
I am getting the warnings
Warning messages: 1: Summary statistic is length 1693 2: Summary statistic is length 1261 3: Summary statistic is length 432 4: Summary statistic is length 335 5: Summary statistic is length 379 6: Summary statistic is length 123 7: Summary statistic is length 856 8: Summary statistic is length 728 9: Summary statistic is length 965
Which are the values i want to be displayed under the "N" - column.
The table i get as an output looks like this
Table 1: 0 1 age 37.507 62.493 educated parent 0 27.998 46.486 1 9.510 16.007 religion None 3.662 16.125 Catholic 8.919 13.467 Other 1.713 5.552 Protestant 23.213 27.348 sex Male 18.252 24.749 Female 19.256 37.744 1 = voted for Obama educated parent: 1 = at least one parent has a degree Source: General social survey
The data is taken from gss_sm from the socviz package. I have created a new religion and a new education variable. Religion is a 4 level factor, and education is a 2 level factor.
I have tried making my own n fuction,
`n<-function() {
if(class(x)!="numeric"){
n<-length(x)
}
else{
n<-sum(!is.na(x))
}
formatC(n,digits=0)
}
` and plugging that in in the place of "N".
It seems like as if it is the N function that isnt working.