0

I need help with RODBC sqlQuery. I run a SQL script that produces 11mill rows of data. It takes 78 secs to pull data using RODBC but unfortunately when I include a date/time field, it takes 180 secs to pull data from RODBC and only 78 secs in Management Studio. I would like to find out why please and what I could do about it.

The date/time field is of this format on Sql server: YYYY-MM-DD HH:MM:SS.000

I make sure I do Sys.setenv(TZ="UTC")before I run this query:

lossdata <- as.data.table(sqlQuery(dbhandle, qry)) 

qry is a string and dbhandle is set using odbcDriverConnect.

I get the date/time field in this format when I pull from R: YYYY-MM-DD HH:MM:SS (without .000)

I've tried RJDBC too but it takes same time. It also takes too long to convert the date/time field from character to posixct after using RJDBC so it's not a good option. Unfortunately I need the field to be in posixct as I use it in sorting and it takes too long if it is not of this data type.

Please help. I'm not sure how I can reproduce this example. Please let me know if you need any additional information.

SQL Queries

With EventDate

select pp.EventDate as EVENTDATE, pp.EVENTID as EVENTID 
from 
    (select * from set.dbo.events where setid in (16,32)) pp 
     inner join 
        (select eventid from databasename.dbo.rdm_port where anls = 93) y 
     on pp.EventId = y.EVENTID

Without EventDate

select pp.EVENTID as EVENTID 
from 
   (select * from set.dbo.events where setid in (16,32)) pp 
    inner join 
       (select eventid from databasename.dbo.rdm_port where anls = 93) y 
    on pp.EventId = y.EVENTID
Parfait
  • 104,375
  • 17
  • 94
  • 125
charliealpha
  • 307
  • 2
  • 12
  • Please show query that includes the datetime and one that does not to see how calculations are being run. – Parfait Sep 18 '17 at 18:00
  • i have made the script smaller to make it readable, let me know if you have any questions. – charliealpha Sep 18 '17 at 18:10
  • Perhaps RODBC is converting your DateTime column from a character to a Date using the function `as.Date`, which is very slow. See https://stackoverflow.com/a/12788992/6004997 for details: "So the crux of the issue (I think) is why strptime is so slow, and maybe that can be improved in R. Or just avoid POSIXlt, either directly or indirectly." – Corey Levinson Sep 18 '17 at 18:19
  • hi corey thanks for taking the time to reply. i ran the query with as.is = TRUE and it ran 2x faster... – charliealpha Sep 18 '17 at 18:44
  • unfortunately i need to preserve the data/time timestamp. i use it to sort later and it turns out very slow if it is not in the right format. – charliealpha Sep 18 '17 at 18:44
  • any particular reason why the .000 would be truncated? do you think that's causing it difficulties? – charliealpha Sep 18 '17 at 18:56
  • @charliealpha - please delete your above query comments which you should have edited in your own post as comments are difficult to read. I added it accordingly. Do note: you are allowed to edit your question at *any* time. – Parfait Sep 18 '17 at 19:34
  • thank you for your time, apologies still kinda new to this... – charliealpha Sep 18 '17 at 19:35

1 Answers1

0
DATEDIFF(s,'1970-01-01 00:00:00',pp.EventDate) as EVENTDATE

And using RJDBC instead solves this particular problem.

RODBC takes longer than RJDBC. Everything is stored as numeric using RJDBC.

The date is now in seconds since 1970-01-01 which is fast to sort. It can be converted to date if need be.

charliealpha
  • 307
  • 2
  • 12