I'm utilizing Rscript
to run an R script through bash, and I want to specify arguments to be passed to functions within the script itself. Specifically, I want to pass arguments that specify:
- path to data file (
.csv
) and - certain column names in that data file.
I run into a problem when the column names include the tilde sign (~
). I've tried wrapping the column names with backticks but still unsuccessful.
Example
I want to write a script that takes in a data file in .csv
format and plots a histogram for one variable according to the user's choice.
Here's my function:
plot_histogram <- function(path_to_input, x_var) {
data_raw <- read.csv(file = path_to_input)
path_to_output_folder <- dirname(path_to_input)
png(filename = paste0(path_to_output_folder, "/", "output_plot.png"))
hist(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", xlab = "my_var")
replicate(dev.off(), n = 20)
}
Let's run it on some fake data
set.seed(123)
df <- data.frame(age = sample(20:80, size = 100, replace = TRUE))
write.csv(df, "some_age_data.csv")
plot_histogram(path_to_input = "some_age_data.csv",
x_var = "age")
As intended, I get a .png
file with the plot, saved to the same directory where the .csv
is at
Now customize an R script to be run from command line
plot_histogram.R
args <- commandArgs(trailingOnly = TRUE)
## same function as above
plot_histogram <- function(path_to_input, x_var) {
data_raw <- read.csv(file = path_to_input)
path_to_output_folder <- dirname(path_to_input)
png(filename = paste0(path_to_output_folder, "/", "output_plot.png"))
hist(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", xlab = "my_var")
replicate(dev.off(), n = 20)
}
plot_histogram(path_to_input = args[1], x_var = args[2])
Then run via command line using Rscript
$ Rscript --vanilla plot_histogram.R /../../../some_age_data.csv "age"
Works too!
However, things break if the column name contains tilde
Step 1: create fake data
library(tibble)
set.seed(123)
df <- tibble(`age-blah~value` = sample(20:80, size = 100, replace = T))
write.csv(df, "some_age_data.csv")
Step 2: Using Rscript
:
$ Rscript --vanilla plot_histogram.R /../../../some_age_data.csv "age-blah~value"
Error in hist.default(as.numeric(na.omit(data_raw[[x_var]])), main = "histogram", : invalid number of 'breaks' Calls: plot_histogram -> hist -> hist.default Execution halted
Bottom Line
When using Rscript
, how can I pass an argument that specifies a column name containing tilde? Alternatively, how can I work around .csv
files that have such a format of tilde in column names, within the framework of Rscript
?
Thanks!