1

I'm trying to read and convert a python list object into an R vector using recticulate in RStudio. According to the 'Converting between R and Python' section of the docs this should be a fairly trivial task using the py_to_r() function.

Here is my code.

library(reticulate)

my_list <- py_to_r(['prtHrt_snRNAseq_Cer-CycPro_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Endo_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-ExN-Pro_SCZ_summary.tsv', 
                 'prtHrt_snRNAseq_Cer-Granule-1_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-2_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-3_SCZ_summary.tsv', 
                 'prtHrt_snRNAseq_Cer-Granule-4_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-Pro_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-InN-Pro-1_SCZ_summary.tsv', 
                 'prtHrt_snRNAseq_Cer-InN-Pro-2_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-MG_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-N-undef-1_SCZ_summary.tsv', 
                 'prtHrt_snRNAseq_Cer-N-undef-2_SCZ_summary.tsv'], TRUE)

However, when attempting this R can't parse this data structure.

py_to_r(['prtHrt_snRNAseq_Cer-CycPro_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Endo_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-ExN-Pro_SCZ_summary.tsv', 
Error: unexpected '[' in "py_to_r(["
>                  'prtHrt_snRNAseq_Cer-Granule-1_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-2_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-3_SCZ_summary.tsv', 
Error: unexpected ',' in "                 'prtHrt_snRNAseq_Cer-Granule-1_SCZ_summary.tsv',"
>                  'prtHrt_snRNAseq_Cer-Granule-4_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-Granule-Pro_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-InN-Pro-1_SCZ_summary.tsv', 
Error: unexpected ',' in "                 'prtHrt_snRNAseq_Cer-Granule-4_SCZ_summary.tsv',"
>                  'prtHrt_snRNAseq_Cer-InN-Pro-2_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-MG_SCZ_summary.tsv', 'prtHrt_snRNAseq_Cer-N-undef-1_SCZ_summary.tsv', 
Error: unexpected ',' in "                 'prtHrt_snRNAseq_Cer-InN-Pro-2_SCZ_summary.tsv',"
>                  'prtHrt_snRNAseq_Cer-N-undef-2_SCZ_summary.tsv'])
Error: unexpected ']' in "                 'prtHrt_snRNAseq_Cer-N-undef-2_SCZ_summary.tsv']"

The python list is coming from the expand function of a snakemake rule which I want to feed into an R script.

Is there any way for a python list structure such as this to be converted into an R object, such as a vector, within R?

Any help/ideas/advice would be greatly appreciated.


Update as requested by @Dariober - 29/0721


So I have a work around for this where I use a system call in the R script to create a .csv file that contains the file list that the expand function generates in snakemake. I then read this into R and process it as I would if I could get R to parse the output of the expand function directly.

Here is the snakemake rule:

rule create_ldsc_group_plots:
    # R produces 5 plots but only tracking the final plot here 
    input:   expand(PART_HERIT_DIR + "prtHrt_snRNAseq_{CELL_TYPE}_SCZ.rds", CELL_TYPE = config["RNA_CELL_TYPES"])
    output:  PART_HERIT_DIR + "Thal_ldsc_RNA_group_plot_lst.rds"
    params:  out_dir = PART_HERIT_DIR 
    message: "Creating ldsc group plots for all regions and SCZ GWAS"
    log:     "logs/LDSR/snRNAseq.AllRegions.SCZ_partHerit.group.plots.log"
    shell:
             """
             export R_LIBS_USER=/R/library
             module load libgit2/1.1.0
             /apps/languages/R/4.0.3/el7/AVX512/gnu-8.1/bin/Rscript --vanilla \
             scripts/R/scRNAseq_LDSC_create_group_plots.R {params.out_dir}  2> {log}
             
             """

And here is the snippet of the R code:

## Parse region / set region variable ---------------------------------------------------
cat('\nParsing args ... \n')
p <- arg_parser("Read out dir for LDSC RNA group plotting ... \n")
p <- add_argument(p, "out_dir", help = "No out dir region provided")
args <- parse_args(p)
print(args)

##  Set variables  ----------------------------------------------------------------------
REGIONS <- c("Cer", "FC", "GE", "Hipp", "Thal")
OUT_DIR <- args$out_dir


for (REGION in REGIONS) {
  
  plot_list <- list()
  
  ##  Create region specific .csv file  -------------------------------------------------
  cat(paste0("\nCreating ", REGION, " .csv file ...\n"))   
  system(paste0("ls ", OUT_DIR, "*", REGION, "*_SCZ.rds > ", OUT_DIR,  REGION, ".csv"))

  ##  Load and prep .rds file info  -----------------------------------------------------
  cat(paste0("Loading ", REGION, " .rds file info ...\n"))
  rds_file_df <- read_csv(paste0(OUT_DIR, REGION, ".csv"), col_names = FALSE)
  rds_file_vector <- pull(rds_file_df, X1) # pull equivalent to as.vector in dplyr

Notice that I'm reading in arguments to the R script via argparser. I tried using {input} as an additional argument to the R script but argparser wants single inputs, there are 91 filenames in the expand list. Also notice that I need to load an additional module before running the R script, this means I need to use the shell directive instead of script in the rule and, as far as I'm aware, I can't use the snakemake$input function within the R script as a result of this. I have tested this.

Tbh the workaround I now have whilst dirty, does work. The reason I never posted the snakemake rule etc. before is that I was hoping the answer to the initial question would be trivial, hence the parsimonious structure of the question. The other option I guess would be to use a virtual environment with R and the prerequisite packages installed in it then I could run the script directive but that is perhaps a bit much for a rule at the end of a pipeline that is designed to make a few plots.

Is there is a simple way to get round this that I'm not seeing?

Darren
  • 277
  • 4
  • 17

2 Answers2

0

I guess the problem is that what you give function py_to_r is not python object (you are working from within R). You can try something like this:

tuple(
  'prtHrt_snRNAseq_Cer-CycPro_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-Endo_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-ExN-Pro_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-Granule-1_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-Granule-2_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-Granule-3_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-Granule-4_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-Granule-Pro_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-InN-Pro-1_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-InN-Pro-2_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-MG_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-N-undef-1_SCZ_summary.tsv', 
  'prtHrt_snRNAseq_Cer-N-undef-2_SCZ_summary.tsv'
) %>% 
  py_to_r()

Obviously this is not of much use - you can make R object without making first python one. I think that function py_to_r should be used in conjecture with python functions that return python objects.

det
  • 5,013
  • 1
  • 8
  • 16
  • Thanks for the suggestion, I have a similar workaround that formats the list using the `system()` command in R but I was hoping for something cleaner and wondering if R could parse the python list directly. – Darren Jul 29 '21 at 09:31
0

The python list is coming from the expand function of a snakemake rule which I want to feed into an R script.

Maybe you are making things more complicated than necessary. Say you have:

rule one:
    input:
        fin= expand('...'),
    output:
        out= ...
    script:
        'my-script.R'

then inside my-script.R you access the list of files input.fin with:

fin <- snakemake@input[['fin']]
dariober
  • 8,240
  • 3
  • 30
  • 47
  • I've been trying to do something similar with `argparser` as I'm using `shell` instead of `script` (I need to load modules on slurm before running the R script). I'm not sure I can use the `snakemake@input` option with a `shell` command, can I? Update: Just tried it says snakemake not found! – Darren Jul 29 '21 at 09:56
  • @Darren Edit your question to show what you are trying to do. E.g. show the snakemake rule causing the issue. – dariober Jul 29 '21 at 12:15