4

Currently I'm building a Shiny APP using several queries to a PostgreSQL database (mainly SELECT and INSERT statements). The application works but I'm trying to make it faster. When I compare the execution times between the same query using the RPostgreSQL package and a db client like Postico, it's taking 8 times more with the RPostgreSQL package.

Any ideas of ways of boosting the performance or connecting to a PostgreSQL database from R?

Thanks

Esteban Angel
  • 51
  • 2
  • 4
  • 2
    I'm not sure there's really enough information here to troubleshoot it for you. There could be a lot of differences in your data pipeline. Do you have a reproducible example? Have you tried other R libraries like `odbc`? You can use it with a postgres driver and it should be much faster (at least it's much faster than `RODBC`). – Hack-R Aug 25 '18 at 20:29
  • Can you provide something reproducible? Say a shell script calling `psql` to create a table, insert some date and then some timed operations comparing RPostgreSQL (which uses the same C library as other Postgres frontends...) to Postico etc? – Dirk Eddelbuettel Aug 26 '18 at 11:51
  • You could give a try to [RPostgres](https://cran.r-project.org/package=RPostgres). – Scarabee Aug 27 '18 at 08:38
  • thanks @Scarabee, the RPostgres package is at least 50% faster than the RPostgreSQL. On the other hand, we also developed a external python API for making the queries and returning them to R, I will update this thread when we have results – Esteban Angel Aug 29 '18 at 16:13

1 Answers1

1

Have you ever heard about the package dbplyr (with the b)?

I would recommend it because this package enables your dplyr (with no b) to be used with SQL databases.

There are many advantages since the way you interact with your databases will shift

from this:

enter image description here

to this:

enter image description here

These images are extracted from a great article entitled "Databases using R" by Edgar Ruiz (2017). You should take a look at it HERE for more details.

The main advantages presented by Mr. Ruiz are, and I quote:

" 1) Run data exploration over all of the data - Instead of coming up with a plan to decide what data to import, we can focus on analyzing the data inside the database, which in turn should yield faster insights.

2) Use the SQL Engine to run the data transformations - We are, in effect, pushing the computation to the database because dplyr is sending SQL queries to the database.

3) Collect a targeted dataset - After become familiar with the data and choosing the data points that will either be shared or modeled, a final query can then be used to bring back only that data into memory in R.

4) All your code is in R! - Because we are using dplyr to communicate with the database, there is no need to change language, or tools, to perform the data exploration. "

So, you will probably gain the speed you are looking for with dbplyr/dplyr.

You should give it a try.

You can find more information about it and how to establish the connection with your PostgreSQL Server using the DBI package at:

https://cran.r-project.org/web/packages/dbplyr/vignettes/dbplyr.html

and

https://rviews.rstudio.com/2017/05/17/databases-using-r/

allanvc
  • 1,096
  • 10
  • 23
  • 3
    This answer has very little to do with the question, besided also mentioning R and Postgres. As I read OP's question, he was not asking for dbplyr commercial. – Dirk Eddelbuettel Aug 26 '18 at 11:49
  • 4
    @Dirk Well, I'm not receiving any $$$ from Edgar Ruiz lol. The OP is asking for "ideas of ways of boosting the performance". Why do you think dbplyr would not be an alternative for this? – allanvc Aug 27 '18 at 02:08