0

I have a data frame with a grouping variable and I want to create and save a plot for each group using ggplot. I tried to use lapply and split the data by group, then use pdf to save the plot as a file named after the group. However, when I run the code, What am I doing wrong and how can I fix it? Here is my code and sample data:

# generate a random data frame with a grouping variable
set.seed(123)
df <- data.frame(
  x = rnorm(100),
  y = rnorm(100),
  group = sample(LETTERS[1:4], 100, replace = TRUE)
)

# use lapply to loop over each group
lapply(split(df, df$group), function(d) {
  # use ggplot to plot x vs y for each group
  p <- ggplot(d, aes(x, y)) + geom_point() + ggtitle(paste("Group", d$group[1]))
  # use pdf to save the plot as a file named after the group
  pdf(paste0("output_data/", d$group[1], ".pdf"))
  p
  dev.off()
})

The output file can't be opened

I have tried:

  1. use print(p) instead of p
  2. ggsave does not work well with my data
  3. It's ok if I do not run the script in sapply or lapply

I noticed that the same script might output fine on someone else's device

=====update=====

my sessionInfo()

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8  LC_CTYPE=Chinese (Simplified)_China.utf8    LC_MONETARY=Chinese (Simplified)_China.utf8
[4] LC_NUMERIC=C                                LC_TIME=Chinese (Simplified)_China.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cowplot_1.1.1          patchwork_1.1.2        ggrepel_0.9.1          plyranges_1.16.0       org.Hs.eg.db_3.15.0    annotatr_1.22.0       
 [7] ggforce_0.4.1          GenomicFeatures_1.48.4 AnnotationDbi_1.58.0   Biobase_2.56.0         ggbio_1.44.1           ggplot2_3.4.0         
[13] dplyr_1.1.0            rtracklayer_1.56.1     GenomicRanges_1.48.0   GenomeInfoDb_1.32.4    IRanges_2.30.1         S4Vectors_0.34.0      
[19] AnnotationHub_3.4.0    BiocFileCache_2.4.0    dbplyr_2.2.1           BiocGenerics_0.42.0   

loaded via a namespace (and not attached):
  [1] backports_1.4.1               Hmisc_4.7-1                   systemfonts_1.0.4             plyr_1.8.7                   
  [5] lazyeval_0.2.2                splines_4.2.1                 BiocParallel_1.30.4           digest_0.6.29                
  [9] ensembldb_2.20.2              htmltools_0.5.3               fansi_1.0.3                   magrittr_2.0.3               
 [13] checkmate_2.1.0               memoise_2.0.1                 BSgenome_1.64.0               cluster_2.1.3                
 [17] tzdb_0.3.0                    readr_2.1.3                   Biostrings_2.64.1             matrixStats_0.62.0           
 [21] timechange_0.1.1              prettyunits_1.1.1             jpeg_0.1-9                    colorspace_2.0-3             
 [25] blob_1.2.3                    rappdirs_0.3.3                textshaping_0.3.6             xfun_0.33                    
 [29] crayon_1.5.2                  RCurl_1.98-1.9                graph_1.74.0                  survival_3.3-1               
 [33] VariantAnnotation_1.42.1      glue_1.6.2                    polyclip_1.10-0               gtable_0.3.1                 
 [37] zlibbioc_1.42.0               XVector_0.36.0                DelayedArray_0.22.0           scales_1.2.1                 
 [41] DBI_1.1.3                     GGally_2.1.2                  Rcpp_1.0.9                    xtable_1.8-4                 
 [45] progress_1.2.2                htmlTable_2.4.1               foreign_0.8-82                bit_4.0.4                    
 [49] OrganismDbi_1.38.1            Formula_1.2-4                 htmlwidgets_1.5.4             httr_1.4.4                   
 [53] RColorBrewer_1.1-3            ellipsis_0.3.2                pkgconfig_2.0.3               reshape_0.8.9                
 [57] XML_3.99-0.11                 farver_2.1.1                  nnet_7.3-17                   deldir_1.0-6                 
 [61] utf8_1.2.2                    RMariaDB_1.2.2                tidyselect_1.2.0              labeling_0.4.2               
 [65] rlang_1.0.6                   reshape2_1.4.4                later_1.3.0                   munsell_0.5.0                
 [69] BiocVersion_3.15.2            tools_4.2.1                   cachem_1.0.6                  cli_3.4.1                    
 [73] generics_0.1.3                RSQLite_2.2.18                stringr_1.4.1                 fastmap_1.1.0                
 [77] ragg_1.2.4                    yaml_2.3.5                    knitr_1.40                    bit64_4.0.5                  
 [81] purrr_0.3.5                   KEGGREST_1.36.3               AnnotationFilter_1.20.0       RBGL_1.72.0                  
 [85] nlme_3.1-157                  mime_0.12                     xml2_1.3.3                    biomaRt_2.52.0               
 [89] compiler_4.2.1                rstudioapi_0.14               filelock_1.0.2                curl_4.3.3                   
 [93] png_0.1-7                     interactiveDisplayBase_1.34.0 tweenr_2.0.2                  tibble_3.1.8                 
 [97] stringi_1.7.8                 lattice_0.20-45               ProtGenerics_1.28.0           Matrix_1.4-1                 
[101] vctrs_0.5.2                   pillar_1.8.1                  lifecycle_1.0.3               BiocManager_1.30.18          
[105] data.table_1.14.2             bitops_1.0-7                  httpuv_1.6.6                  R6_2.5.1                     
[109] BiocIO_1.6.0                  latticeExtra_0.6-30           promises_1.2.0.1              gridExtra_2.3                
[113] codetools_0.2-18              dichromat_2.0-0.1             MASS_7.3-57                   assertthat_0.2.1             
[117] SummarizedExperiment_1.26.1   rjson_0.2.21                  withr_2.5.0                   regioneR_1.28.0              
[121] GenomicAlignments_1.32.1      Rsamtools_2.12.0              GenomeInfoDbData_1.2.8        mgcv_1.8-40                  
[125] parallel_4.2.1                hms_1.1.2                     grid_4.2.1                    rpart_4.1.16                 
[129] MatrixGenerics_1.8.1          biovizBase_1.44.0             lubridate_1.9.0               shiny_1.7.3                  
[133] base64enc_0.1-3               interp_1.1-3                  restfulr_0.0.15              
zhang
  • 185
  • 7
  • 1
    Can you give us more information on your system? Maybe the output of `sessionInfo()` if you say that it works on other devices? – rps1227 Mar 17 '23 at 10:31
  • Also, possibly not directly relevant, but it is safer to use `file.path(output_data, paste0(d$group[1], ".pdf"))` rather than `paste`ing the entire path. Or `reader::cat.path()`... – Limey Mar 17 '23 at 10:38
  • 1
    Hi, @rps1227 I have update it – zhang Mar 17 '23 at 11:22
  • Hi, @Limey, Thanks for your suggestion, but I don't know the difference between them, can you explain why the `paste0` is not safe? – zhang Mar 17 '23 at 11:23
  • `paste0` doesn't take account of OS specific features of paths, such as path separators. `file.path` does. – Limey Mar 17 '23 at 11:25
  • Ok, I got it, Its like use `os.path.join` instead of str.join() of path in python, thanks – zhang Mar 17 '23 at 11:27
  • You can do `pdf(onefile = FALSE); lapply(......); dev.off()`. But then you'll have to rename the files. – Stéphane Laurent Mar 17 '23 at 12:43
  • Yeah, Its seems work, but I am not sure why `pdf` not work in `lappy` or `sapply` – zhang Mar 18 '23 at 14:34

1 Answers1

1

This works:

lapply(split(df, df$group), function(d) {
  pdf(file = paste0(d$group[1], ".pdf"))
  print(ggplot(d, aes(x, y)) + geom_point() + ggtitle(paste("Group", d$group[1])))
  dev.off()
})
Stéphane Laurent
  • 75,186
  • 15
  • 119
  • 225
  • Yeah, Its seems work, but I am not sure why `pdf` not work in `lappy` or `sapply` with `p` or `print(p)` but `print(ggplot(...))` work well – zhang Mar 18 '23 at 14:35