How does OLE DB Source storage records in memory? (SSIS)

Question

I have a OLE DB Source in my Sql Server Integration Service package. And then, I also have a Script component connected to the Source.

The OLE DB Source access data from a SQL query that gets 500.000 records. I want to know if when the package starts running the application begins to storage all the records in memory (500.000) or if the package storage records in cicle's like 5.000 records...

How does it work? Can I configure this?

No, it's not going to buffer all the rows in memory but some smaller amount. Yes, you can configure how much, but you shouldn't, at least not without understanding your data flow and the other knobs you can turn. See the [performance tuning whitepaper](https://technet.microsoft.com/library/cc966529). — Jeroen Mostert, Jul 04 '17 at 19:13
Possible duplicate of [Default Buffer Max Size and Default Buffer Size in SSIS](https://stackoverflow.com/questions/22377037/default-buffer-max-size-and-default-buffer-size-in-ssis) — Tab Alleman, Jul 05 '17 at 12:56
Well, I didn't know about Default Buffer Max Size and Default Buffer Size before Jeroen Mostert answer me... There are different questions — Mariano Bejo, Jul 05 '17 at 14:18

score 0 · Answer 1 · answered Jul 05 '17 at 21:19

Your package could attempt to buffer all 500.000 rows into memory if you have asynchronous/blocking components (Sort/Aggregate) because data can't flow beyond that point until every row has been sent from the source and has reached that component. Only then can SSIS determine the maximum value for column X or that all the rows have been sorted by key Y.

If your machine runs out of memory, then you will have a event recorded for buffers spilled to disk (approximate name). That means that your high performance in memory ETL engine has now begun writing the data to disk. Performance is going to suffer mightily at this point because as all that data gets written to disk so that you can get through that blocking component, guess what? That written data has to get read back from disk now that whatever calculation had to happen has happened. And if you happened to do something like Sort my data in the data flow followed by aggregate data, you just paid for doubly poor performance.

That said, if you're using just the synchronous components, there are mechanisms built into the data flow to detect back pressure. Thus, your target destination can't keep up with the source flow it will signal that component to send fewer records until it can catch up. Pretty clever stuff but nothing you as a developer can really influence beyond not adding asynchronous components unless the solution requires it.

How does OLE DB Source storage records in memory? (SSIS)

1 Answers1