15

I'm currently writing an SSIS package that retrieves data from a stored procedure via an OLE DB Source. The stored procedure contains a rather nasty query that I've been able to improve with the use of temp tables. If I switch these temp tables to table variables, the logical reads jump from about 1.3 million to about 56 million. I'm uncomfortable enough with the 1.3 million, but there is no way that I can be satisfied with the 56 million logical reads. Thus, I can't really convert the temp tables to table variables.

However, SSIS (or rather SQL Server) can't resolve the metadata for this query, so the package won't run. I've found a few different solutions online, but none of them seem to work for both SQL Server 2008 and SQL Server 2014. We are currently in the process of upgrading all of our servers to 2014, and this particular package runs against 2008 in DEV, 2014 in QA, and 2008 in production currently. By the fall, the PROD tier will be 2014, and the DEV tier will be promoted sometime after that. Unfortunately, I can't wait until these upgrades happen to release this SSIS package. The data needs to start moving by next week. Thus, I need to figure out a way to get the metadata resolved for both environments. Here's what I've tried so far:

  1. Add a dummy select in an IF 1=0 block which returns the proper metadata. This works in 2008, but not 2014.

  2. Use SET FMTONLY OFF at the beginning of the stored procedure. This works in 2008, but not 2014. Furthermore, it causes the stored procedure to run once for each column returned (over 30 in this case), which is a deal-breaker even if it did work.

  3. Use EXEC ... WITH RESULT SETS (( ... ));. This works in 2014, but not in 2008.

  4. Deploy a stored procedure which returns the proper metadata, build and deploy the SSIS package, then modify the stored procedure to the proper version. This hasn't seemed to work in either environment, and this would complicate of any other ETL applications developed within our ETL framework.

If I can't figure anything out, I could either deploy different stored procedures and packages to the different tiers, but I would very much prefer against this. For one, this would complicate future releases, and I would also need to ensure that I don't forget about updating the stored procedure and package once we upgrade the servers.

I could also make real tables in the database which would take the place of these temp tables. I don't really like this solution, but it's something that I could tolerate. If I end up doing this, I would probably switch to using the WITH RESULT SETS in the future.

However, I personally don't care much for either of these solutions, so I was wondering if there is any workaround that I missed that might work a bit better.

Nelson O
  • 263
  • 2
  • 8
  • 4
    Excellent first post where you've clearly done your research. Additional research for you on the [temp table versus table variable](http://dba.stackexchange.com/questions/16385/whats-the-difference-between-a-temp-table-and-table-variable-in-sql-server) – billinkc Aug 06 '15 at 14:54
  • Have you tried a combination of 1 & 3? A single stored proc that has BOTH `With Result Sets...` AND `IF 1=0..` ? – Tab Alleman Aug 06 '15 at 14:55
  • The problem is that the WITH RESULT SETS isn't available in 2008, so the query won't run in DEV or PROD. – Nelson O Aug 06 '15 at 14:59
  • 1
    You've hit the nail on the head about about the breaking change between 2008 and 2012 on how SSIS handles temporary tables. Refresh my memory though, why would the SSIS package change? It'd still be calling `dbo.MyProc` whether the invoker was 2008 or 2014. It's just the underlying procedure that would need to be corrected for the environment, yeah? – billinkc Aug 06 '15 at 15:05
  • The WITH RESULT SETS is a clause on the actual call to the stored procedure (unless I'm missing something and you can also put it in the SP definition), so the SSIS package would have this clause for the 2014 servers, but not the 2008 servers. – Nelson O Aug 06 '15 at 15:09
  • 1
    How about using a wrapper stored proc? On both SSIS Servers, the same SSIS code calls the same proc name. On 2014, that proc uses WITH RESULT SETS to call a child proc that does the real work. On 2008, the wrapper proc uses IF 1=0 and calls the same child proc (or the same proc code but on 2008). – Tab Alleman Aug 06 '15 at 15:12
  • Oh that's right and Tab hits on the ghetto way we were doing it (been a few clients since I ran into the actual usage) We had a proc that provided the interface which then used the `WITH RESULT SETS` / no-op SELECT and then called into the actual worker procedure. My personal preference is for materialized tables just because it makes the implementation cleaner but sometimes ya have to ship it. – billinkc Aug 06 '15 at 15:28
  • ghetto...lol. Well if you're confirming that it would actually work, I'll take my bows now. – Tab Alleman Aug 06 '15 at 15:56
  • Thank you for your help on this. I was a bit wary of that implementation with the wrapper because I'd ideally be able to deploy the same code to all of my tiers. What I ended up doing was creating a staging table that holds the results. The results are then loaded into the table by the SP, and then selected out elsewhere. Initially, I was going to have my select stored procedure call another to load the data, but it still throws the metadata invalidation due to the temp tables in the inner SP. I think I'll just separate out the insert and select for this table as I can deploy that everywhere. – Nelson O Aug 06 '15 at 18:36
  • 1
    Have you tried a delayed validation setting to true for the package so it doesn't look for the meta data until it runs that step? – Joseph Gagliardo Aug 13 '15 at 18:51
  • Is this a real temp table "@Table" or "@@Table"? or is this a regular table that you storing data in temporarily? – Jason Geiger Aug 24 '15 at 12:54
  • @JasonGeiger A temp table uses the symbol # and is a disk table, a table variable uses the symbol @ and is an in-memory table. What he later refers in his comment as a staging table is a regular table for storing a dataset temporarily. With temp and variable tables the use of single symbol "@" versus a double one "@@" is just a matter of scope; current connection vs. global. – Ricardo C Aug 31 '15 at 19:17
  • @Nelson Logical reads and not really a bad thing, but physical ones are. In any case, you can reduce them by proper indexing and fine tuning the queries. And you can certainly create indexes in temp tables. Access to the system in question would be necessary for further analysis. – Ricardo C Aug 31 '15 at 19:25

1 Answers1

2

Despite your reluctance, I think you've made the right choice and a dedicated staging area is the right way to go. Most of the production ETLs I've worked with have a dedicated staging database, never mind tables. You then have the benefit of being able to control the storage more explicitly, which makes performance more reliable and the whole thing generally more maintainable. For example, you can create a dedicated contiguous block of fast disk space for these tables with their own file group etc. I'd certainly rather see 2 separate SPs relying on a few physical tables than a really gnarly single one.

That said, without knowing any specifics this is just my experience, so a caveat for future readers: As with all things database, be sure to measure the actual performance of your scenario (before and after) rather than making any assumptions based on the query plan - it might be misleading you.

Matt
  • 425
  • 5
  • 11
  • I concur with Matt here. Take a look at optimizing the physical temp table if performance is really that bad. Try to discover patterns in your table variable vs temp table usage on data sets of various size and cardinality. From my moderate experience, each environment has a "sweet spot" where it makes more sense to go to disk rather than memory. The fun part is when that changes over time. Job security! – C B Sep 24 '15 at 19:49