1

Note: I tagged this under R because I am an R user, but the topic of this question is general, so I appreciate any input regardless of programing language.

Hello Everyone,

My company is expanding into a data-science/developer field and will be developing codes/scripts in the near future. I want to develop a standard practice for when it comes to sharing and archiving developed codes internally. The main point being would be if we develop a code today, what can we do to that code and around that code so that it can be readily understandable 5 years from now. Essentially, what are the best practices when it comes to code sharing and archiving it?

I did some research for this, so I understand the preference for DRY (Don't repeat yourself) to WET (We enjoy typing), having a top level README, readable and easily to understand comments, as well as having screenshots or examples of the finished product. What are your thoughts on these practices, how could they be improved?

Alokin
  • 461
  • 1
  • 4
  • 22
  • I work in a university lab, not in a company, but we focus heavily on creating function documentation ([generating .Rd files](https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html)) and long-form documentation ([vignettes](http://kbroman.org/pkg_primer/pages/vignettes.html)). When the pipeline is developed, we package it up. A lot of what we develop is geared towards depositing in Bioconductor or CRAN, but you could use this for internal code as well. – csgroen Mar 29 '18 at 14:36
  • The vignettes part looks really interesting. I was thinking of having Notepad++ as a required software to read the codes because of the versatility it provides to reading and commenting code. Do you think that's a good option? – Alokin Mar 29 '18 at 14:46
  • I think it's a good idea if you're expecting code written in different programming languages. If it's just R, RStudio does just fine and works well with help files (Rd) and vignettes (we tend to use HTML vignettes generated by Markdown). The good thing about using roxygen for Rd files is that the doc is written in the code, above the function. So you can also read it while editing the code. – csgroen Mar 29 '18 at 14:56
  • Thank you, I will give this a shot. – Alokin Mar 29 '18 at 15:48
  • 1
    I second @csgroen on making your scripts and pipelines into R packages. In addition to all the other advantages (integrated documentation, easy testing, modularization) they save you from having to deal with git submodules. Hadley Wickham wrote a free book on using packages: http://r-pkgs.had.co.nz/ – divibisan Mar 29 '18 at 15:57
  • In general, the more structure the better. I highly recommend following guidelines for developing packages (see http://r-pkgs.had.co.nz/) as well as a folder template for ad-hoc analyses such as Project Template (http://projecttemplate.net/) – zlipp Mar 29 '18 at 16:06
  • In your opinion, are there any disadvantages in creating packages? – Alokin Mar 29 '18 at 17:33

0 Answers0