0

I'm writing a R package for the first time. Its called aactr and is hosted on my GitHub: https://github.com/jasonbaik94/aactr

Right now, the package only has one function, aact_connect:

aact_connect <- function(user, password) {

  drv <- DBI::dbDriver('PostgreSQL')
  con <- DBI::dbConnect(drv,
                   dbname="aact",
                   host="aact-db.ctti-clinicaltrials.org",
                   port=5432,
                   user=user,
                   password=password)

}

My concern is that users of my package wouldn't like to input their username and password in their R script for privacy reasons.

What would be a nice workaround to ensure user privacy?

One thought I had: When the user types aact_connect(), a window pops up where the user can input username and password and press Enter, after which the connection would be made. Also, for those without a username or password, I'd put a parameter, init_connection = TRUE, after which this sign up page would load: https://aact.ctti-clinicaltrials.org/users/sign_up

Any others suggestions are greatly welcome!

  • 2
    This is a solved problem. I mentored the development of the initial (and still widely used) RPostgreSQL package during Google Summer of Code some ten or twelve years ago ... and already then did we use proper environment variables (as opposed to scripts) as described in part 2. of @Parfait's answer. – Dirk Eddelbuettel Dec 25 '18 at 04:59

1 Answers1

1

There are many ways to securely handle database credentials in scripts. See few examples below:

  1. Use configuration files such as yaml saves securely on user's machines for R to read into needed variables. See @Spacedman's answer.

    config.yaml

    db:
     host : localhost
     port : 5432
     name : mypgdb
     user : pg_useR
     pwd  : ***
    

    R

    library(yaml)
    config = yaml.load_file("/path/to/config.yml")
    
    dbConnect(drv, host = config$db$host, port = config$db$port,
              dbname = config$db$name, 
              user = config$db$user, password = config$db$pwd)
    
  2. Use environmental variables that are linked to user or system profiles:

    db_creds <- Sys.getenv(c("DB_HOST", "DB_PORT", "DB_NAME", "DB_USER", "DB_PWD"))
    
    con <- DBI::dbConnect(drv,
                          dbname = db_creds[['DB_NAME']],
                          host = db_creds[['DB_HOST']],
                          port = db_creds[['DB_PORT,']],
                          user = db_creds[['DB_USER']],
                          password = db_creds[['DB_PWD']])
    
  3. Use DSNs that maintains connection parameters but requires an ODBC connection which can be run with the generalized odbc package (part of same DBI family as RPostgreSQL). Postgres maintains up-to-date odbc drivers for most operating systems. See R-bloggers post.

    odbc.ini

    [myPG_DSN]
    Driver = PostgreSQL Unicode
    Database = mypg_db
    Servername = localhost
    UserName = pg_useR
    Password = ***
    

    R

    db <- dbConnect(odbc::odbc(), "myPG_DSN")
    
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I'm running with the first option. However, how do I get the user of my `aactr` package to input his username and password into the yaml file? –  Dec 25 '18 at 19:44
  • In docs, describe how yaml file should be set up and pass in the yaml path as function argument. – Parfait Dec 25 '18 at 20:12
  • Thanks. I understand the "pass in the yaml path as function" part, but not sure about how I should set up the yaml file in a package? –  Dec 25 '18 at 20:19
  • 1
    The yaml file should be the user-created config file with their trusted credentials (outside package). – Parfait Dec 25 '18 at 20:20
  • Ah that makes so much sense. –  Dec 25 '18 at 20:22
  • Do you have an examples of R packages that are similar to `aactr`, which has wrapper functions to allow users to easily access SQL database? –  Dec 25 '18 at 20:55
  • 1
    Unfortunately, I do not. Most R packages either provide one solution: data store, data handling, special modeling, etc. rarely more. Do be aware your package will add dependencies if it relies on other packages and external data stores which too can change versions (i.e. Postgres 9, 10, 11) outside your control. – Parfait Dec 25 '18 at 21:25