0

I'm trying to run a SQL script inside an R notebook chunk. However, the final SELECT returns no results if it is preceded by multiple SQL statements (that don't generate result sets) within the same chunk.

  1. I have a functioning connection. The database server runs SQL Server.
  2. My chunks that only contain one statement return output in a dataframe.
  3. However, I also need to do multi-pass SQL in a script. I'd like to create temp tables along the way and use them to drive some aggregations. I want to save the results into a temp table and conclude with a single SELECT statement from the temp table, producing the output from the chunk.
  4. I tried saving the script to a file and using the code attribute for the chunk. That just got me an error about an incomplete final line. I don't require a separate file. I was just trying another way to hack this, but I couldn't make that work either.

It hasn't worked yet, and I have been unable to trace the problem. Maybe SQL chunks don't even support what I want to make them do. Any ideas?

  • In an R notebook SQL chunk, is it possible to run multiple SQL statements?
  • Is it possible to create a temp table and reference it in a subsequent statement within the same chunk?
  • Is it possible to execute a loop construct within the SQL chunk?

Here are some toy examples.

This chunk works, returning a dataframe.

```{sql test, connection=con, output.var = "test_df"}
select  TABLE_NAME
from    INFORMATION_SCHEMA.TABLES
where   TABLE_SCHEMA = 'dbo';
```

This toy chunk returns nothing.

```{sql test2, connection=con, output.var = "test2_df"}
drop table if exists #tables;
select  TABLE_NAME
into    #tables
from    INFORMATION_SCHEMA.TABLES
where   TABLE_SCHEMA = 'dbo';

drop table if exists #columns;
select  TABLE_NAME,
        COLUMN_NAME
into    #columns
from    INFORMATION_SCHEMA.COLUMNS
where   TABLE_SCHEMA = 'dbo'
and     TABLE_NAME in (
        select  top 3
                TABLE_NAME
        from    #tables)
order by 1, 2;

select  *
from    #columns;
```
Jai Jeffryes
  • 511
  • 5
  • 12

1 Answers1

1

One possible solution is to add set nocount on at the top of your chunk, like this:

```{sql test2, connection=con, output.var = "test2_df"}
set nocount on

drop table if exists #tables;
select  TABLE_NAME
into    #tables
from    INFORMATION_SCHEMA.TABLES
where   TABLE_SCHEMA = 'dbo';

drop table if exists #columns;
select  TABLE_NAME,
        COLUMN_NAME
into    #columns
from    INFORMATION_SCHEMA.COLUMNS
where   TABLE_SCHEMA = 'dbo'
and     TABLE_NAME in (
        select  top 3
                TABLE_NAME
        from    #tables)
order by 1, 2;

select  *
from    #columns;

Another option is to use CTE, like this

```{sql test2, connection=con, output.var = "test2.df"}
WITH TABLES AS (
SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo'
)
SELECT TABLE_NAME, COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'dbo' AND TABLE_NAME IN (
  SELECT TOP 3 TABLE_NAME FROM TABLES
)
ORDER BY 1,2;
langtang
  • 22,248
  • 1
  • 12
  • 27
  • Beautiful! Thank you. It was as simple as adding `SET NOCOUNT ON`. Before that, I made sure only the last statement returned rows with the intention of that getting assigned to `output.var`. However, if execution outputs the number of rows affected for some other statement before that, evidently it preempts the result set output I intended for the last statement. Love the CTE idea, too. I make extensive use of those. In this task, what I'm really doing is using a cursor to iterate through aggregations to limit the hit on the target DB and collecting the results in a temp table. – Jai Jeffryes Mar 09 '22 at 15:28