For the second time in two weeks, I'm working with data that includes a ton of empty columns. It's public records data, I'm only interested in one category. I suspect that other categories of the larger data set use these columns, but the subset I care about doesn't. So I filter out the records I don't want, and then I'd like to systematically cull the empty columns.
This question has a great method:
R: Remove multiple empty columns of character variables
empty_columns <- sapply(df, function (k) all(is.na(k) | k == ""))
df <- df[!empty_columns]
But I'd like to make that a function, so I can run it using the name of the data frame exactly once. Something like:
drop_empty_cols <- function(df) {
empty_columns <- sapply(df, function (k) all(is.na(k) | k == ""))
df <- df[!empty_columns]
}
drop_empty_cols(my_frame)
But ... the method above fails, and fails silently. Here's some sample data:
demo <- read.table(text="Real.Val All.NA Nothin.here
1 3.5 NA tmp
2 3.0 NA tmp
3 3.2 NA tmp
4 3.1 NA tmp
5 3.6 NA tmp
6 3.9 NA tmp" , header = TRUE)
demo$Nothin.here <- ""
(I'm sure there's a way to write a reproducible example with an empty column, but mine was choking. So this empties it after you create the frame.)
If I do drop_empty_cols(demo)
I still have 6 obs. of 3 variables
. If I do
empty_columns <- sapply(demo, function (k) all(is.na(k) | k == ""))
demo <- demo[!empty_columns]
I get the desired result: 6 obs. of 1 variable
. But to reuse that I have to replace demo
three times. Is it even possible to use a function to transform a data frame directly?