I want to get summary statistics, using describe(), on a big data frame (11,024 observations and 25,046 variables - most of the variables are transformation of a small number of variables, and very few will be used for modeling). I am outputting the results of describe() to a text file using sink(). This is the error I get.
# describe the ADS
sink(paste0(dir_path, '/Data/Describe_ADS.txt'))
describe(ads)
Error in rep(")", length(fp) - 1) : invalid 'times' argument In addition: Warning messages: 1: In xrange[freq != 0] <- xrnz : number of items to replace is not a multiple of replacement length 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf 4: In min(x) : no non-missing arguments to min; returning Inf 5: In max(x) : no non-missing arguments to max; returning -Inf
sink()
I suspect there are one or more columns in the data frame that may be causing the problem, but it is not easy to figure out which. desribe() works on a subset of the data frame. I have used describe() successfully on big data frames before (but never quite this big - I am not sure that size is the problem here).
Both summary() and str() work on the same data frame, though summary() gives me only the Min, 1st Qu, and Median, and not Mean, Max, etc. I prefer describe() as the output is formatted better, and it gives information that I cannot get from summary(), like the number of missing and distinct values, more quantiles, lowest and highest values, etc., and works on character values as well. Any insight into what the problem is, is appreciated.
I expected the output of describe() to be redirected to the file specified in sink(), and not to get the error specified above.