0

What I did is on the website How to select efficiently in a very large table

I used RPostgreSQL package under R because I want to manipulate other files at the same time. As you can see from the above website, the table is very large. It will cost more than two hours to complete the selection using RPostgreSQL. But today I use the same SQL code under psql instead of using RpostgreSQL. It only took several minutes. Why?

The code of R is:

sql='SELECT * into new_table FROM table_1 WHERE EXISTS (SELECT 1 FROM table_2 WHERE column=table_1.column_1) AND EXISTS (SELECT 1 FROM table_2 WHERE column=table_1.column_2)'
rs=dbSendQuery(con,sql)

Updated: I used the psycopg2 under python. It also runs pretty fast.

Community
  • 1
  • 1
Ben
  • 665
  • 1
  • 10
  • 27
  • You mean, when you run the query directly in psql, it returns quickly? Is there a lot of rows returned? – harmic Aug 28 '14 at 08:02
  • @harmic Exactly. Millions of rows would be returned. I changed my question. Actually, I wrote the returned rows into a new table. – Ben Aug 28 '14 at 08:32
  • When you say "directly under PostgreSQL" do you mean `psql`, the command line client? Or what? – Craig Ringer Aug 28 '14 at 10:52

1 Answers1

0

If no data are being sent to R, then it would be surprising if there were any difference at all. If data are being sent back to R, that can be relatively slow (I have found psycopg2 and pandas to be faster for data transfer in many cases).

Depending on what is being done with the data in R, it can be more efficient to use PL/R.

Ian Gow
  • 3,098
  • 1
  • 25
  • 31