I'm trying to read and convert a python list object into an R vector using recticulate
in RStudio. According to the 'Converting between R and Python' section of the docs this should be a fairly trivial task using the py_to_r()
function.
Here is my code.
library(reticulate)
my_list <- py_to_r(['prtHrt_snRNAseq_Cer-CycPro_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Endo_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-ExN-Pro_SCZ_summary.tsv',
'prtHrt_snRNAseq_Cer-Granule-1_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-2_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-3_SCZ_summary.tsv',
'prtHrt_snRNAseq_Cer-Granule-4_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-Pro_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-InN-Pro-1_SCZ_summary.tsv',
'prtHrt_snRNAseq_Cer-InN-Pro-2_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-MG_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-N-undef-1_SCZ_summary.tsv',
'prtHrt_snRNAseq_Cer-N-undef-2_SCZ_summary.tsv'], TRUE)
However, when attempting this R can't parse this data structure.
py_to_r(['prtHrt_snRNAseq_Cer-CycPro_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Endo_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-ExN-Pro_SCZ_summary.tsv',
Error: unexpected '[' in "py_to_r(["
> 'prtHrt_snRNAseq_Cer-Granule-1_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-2_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-3_SCZ_summary.tsv',
Error: unexpected ',' in " 'prtHrt_snRNAseq_Cer-Granule-1_SCZ_summary.tsv',"
> 'prtHrt_snRNAseq_Cer-Granule-4_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-Pro_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-InN-Pro-1_SCZ_summary.tsv',
Error: unexpected ',' in " 'prtHrt_snRNAseq_Cer-Granule-4_SCZ_summary.tsv',"
> 'prtHrt_snRNAseq_Cer-InN-Pro-2_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-MG_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-N-undef-1_SCZ_summary.tsv',
Error: unexpected ',' in " 'prtHrt_snRNAseq_Cer-InN-Pro-2_SCZ_summary.tsv',"
> 'prtHrt_snRNAseq_Cer-N-undef-2_SCZ_summary.tsv'])
Error: unexpected ']' in " 'prtHrt_snRNAseq_Cer-N-undef-2_SCZ_summary.tsv']"
The python list is coming from the expand
function of a snakemake rule which I want to feed into an R script.
Is there any way for a python list structure such as this to be converted into an R object, such as a vector, within R?
Any help/ideas/advice would be greatly appreciated.
Update as requested by @Dariober - 29/0721
So I have a work around for this where I use a system call in the R script to create a .csv
file that contains the file list that the expand function generates in snakemake. I then read this into R and process it as I would if I could get R to parse the output of the expand function directly.
Here is the snakemake rule:
rule create_ldsc_group_plots:
# R produces 5 plots but only tracking the final plot here
input: expand(PART_HERIT_DIR + "prtHrt_snRNAseq_{CELL_TYPE}_SCZ.rds", CELL_TYPE = config["RNA_CELL_TYPES"])
output: PART_HERIT_DIR + "Thal_ldsc_RNA_group_plot_lst.rds"
params: out_dir = PART_HERIT_DIR
message: "Creating ldsc group plots for all regions and SCZ GWAS"
log: "logs/LDSR/snRNAseq.AllRegions.SCZ_partHerit.group.plots.log"
shell:
"""
export R_LIBS_USER=/R/library
module load libgit2/1.1.0
/apps/languages/R/4.0.3/el7/AVX512/gnu-8.1/bin/Rscript --vanilla \
scripts/R/scRNAseq_LDSC_create_group_plots.R {params.out_dir} 2> {log}
"""
And here is the snippet of the R code:
## Parse region / set region variable ---------------------------------------------------
cat('\nParsing args ... \n')
p <- arg_parser("Read out dir for LDSC RNA group plotting ... \n")
p <- add_argument(p, "out_dir", help = "No out dir region provided")
args <- parse_args(p)
print(args)
## Set variables ----------------------------------------------------------------------
REGIONS <- c("Cer", "FC", "GE", "Hipp", "Thal")
OUT_DIR <- args$out_dir
for (REGION in REGIONS) {
plot_list <- list()
## Create region specific .csv file -------------------------------------------------
cat(paste0("\nCreating ", REGION, " .csv file ...\n"))
system(paste0("ls ", OUT_DIR, "*", REGION, "*_SCZ.rds > ", OUT_DIR, REGION, ".csv"))
## Load and prep .rds file info -----------------------------------------------------
cat(paste0("Loading ", REGION, " .rds file info ...\n"))
rds_file_df <- read_csv(paste0(OUT_DIR, REGION, ".csv"), col_names = FALSE)
rds_file_vector <- pull(rds_file_df, X1) # pull equivalent to as.vector in dplyr
Notice that I'm reading in arguments to the R script via argparser. I tried using {input}
as an additional argument to the R script but argparser wants single inputs, there are 91 filenames in the expand list. Also notice that I need to load an additional module before running the R script, this means I need to use the shell
directive instead of script
in the rule and, as far as I'm aware, I can't use the snakemake$input
function within the R script as a result of this. I have tested this.
Tbh the workaround I now have whilst dirty, does work. The reason I never posted the snakemake rule etc. before is that I was hoping the answer to the initial question would be trivial, hence the parsimonious structure of the question. The other option I guess would be to use a virtual environment with R and the prerequisite packages installed in it then I could run the script
directive but that is perhaps a bit much for a rule at the end of a pipeline that is designed to make a few plots.
Is there is a simple way to get round this that I'm not seeing?