3

Some time ago I wrote this question because I thought that the query was too slow but after much trial and error I found out that the problem is data ingestion in R. This query takes now 2 seconds to retrieve circa 175000 rows but when it's run in R, via input_data_1 it takes like 15 seconds. The execution plan tells it's the table scan that takes 100% but I can't understand why such a difference.

Any help is appreciated.

https://dba.stackexchange.com/questions/178242/queries-to-in-memory-table-are-slower-than-to-disk-based?noredirect=1#comment347134_178242

EDIT

Here is the code just in case. It's the same as in the query with no processing. I just return what's been read:

EXEC sp_execute_external_script
    @input_data_1 = N'SELECT DEPARTMENT, ACCOUNT, CATEGORY, YEAR, MONTH, CURRENCY, 
                             DIM1, DIM2, DIM3, DIM4, DIM5, VALUE
                      from   DATA 
                      where  YEAR = ''2013bdg''',
    @input_data_1_name=N't',
    @language = N'R',
    @script = N'OutputDataSet <- as.data.frame(t);'
    WITH RESULT SETS (([DEPARTMENT]   VARCHAR (MAX), [ACCOUNT]  VARCHAR (MAX), [CATEGORY] VARCHAR (MAX),
                      [YEAR] VARCHAR (MAX), [MONTH]     VARCHAR (MAX), [CURRENCY] VARCHAR (MAX),
                      [DIM1] VARCHAR (MAX), [DIM2] VARCHAR (MAX), [DIM3] VARCHAR (MAX), 
                      [DIM4] VARCHAR (MAX), [DIM5] VARCHAR (MAX), [VALUE]     FLOAT));
GO

EDIT #2

Through trial and error I have discovered that the slowness is due to the WITH RESULT SETS operator. The one I use (12 columns) is very expensive while if i set it like:

WITH RESULT SETS (( OK BIT not null))

it takes only 3 seconds.

Why this happens is the question.

Leonardo Lanchas
  • 1,616
  • 1
  • 15
  • 37
  • Are you using SQL Server 2017 RC1? – Dan Guzman Jul 19 '17 at 12:28
  • No. My version is "SQL2016 RTM Enterprise Edition (64-bit) 13.0.1601.5" – Leonardo Lanchas Jul 19 '17 at 13:00
  • I don't see R code anywhere.... – Hong Ooi Jul 20 '17 at 01:12
  • @HongOoi Edited the post and added the R code. – Leonardo Lanchas Jul 20 '17 at 07:57
  • The question is why are you loading 175K rows in memory, in the external R server process. If you load more data, loading will take longer. This isn't about "ingestion". It's about *exporting* an entire table and loading into a client's memory. What kind of processing do you want to perform and why are you loading the entire table in memory? The data frame isn't going to be faster than querying an indexed table – Panagiotis Kanavos Jul 20 '17 at 13:42
  • Can you provide a *realistic* example of what you want to do? There was no need for trial and error - if you load more columns, the query will take longer simply because you have to load more data. `This query takes now 2 seconds` where do you run the query? SSMS? It doesn't retrieve *all* the data. Another application? R Studio? What are you comparing? – Panagiotis Kanavos Jul 20 '17 at 13:52
  • Your result set shows a lot of `varchar(max)` columns for date, currency fields. You'll get far better performance if you use the *appropriate* types. `varchar(max)` is meant for BLOBs and requires special processing. There's no way you can have a `Department` name with 2M characters, or years with 6000 digits. Use the correct types and sizes, add indexing and write queries that return the data you actually want – Panagiotis Kanavos Jul 20 '17 at 13:57

0 Answers0