-2

I am trying to read millions of data from sql server and oracle rdbms.

Sql Server - select Id as LinkedColumn,CompareColumn from Source order by LinkedColumn
Oracle -select Id as LinkedColumn,CompareColumn from Target order by LinkedColumn

This Database resides on different server and i want to read this data from different server.

So idea is to keep database server free as much as possible.

I am thinking to read this millions of data in chunks as opposed to reading data using data reader which will execute whole query on database and database server will open those records in memory stream and from there data reader will read the records.

This is taking huge amount of time when there is multiple jobs because of load on database server.

Records in Source : 12377200

Records in Target : 12266800

Because of this order by it is taking way too much time.

So is there any way to execute query on database and somehow get data directly in to my server memory(datatable or list or array etc.) without putting load on database server?

My below code is taking way way too much time(more than 2 hours just to read data from source and target) for 24 millions of records.

Code :

public void Test(SqlConnection srcCon, SqlConnection tgtCon)
        {
            int srcChunkSize = 1000;
            int srcCurCount = 1;
            int tgtChunkSize = 1000;
            int tgtCurCount = 1;
            bool srcBreak = false;
            bool tgtBreak = false;
            Stopwatch stopwatch = new Stopwatch();
            stopwatch.Start();
            var da1 = new SqlDataAdapter(null, srcCon);
            var da2 = new SqlDataAdapter(null, tgtCon);
            da1.SelectCommand.CommandTimeout = 0;
            da2.SelectCommand.CommandTimeout = 0;
            while (true)
            {
                var srcDt = new DataTable();
                var tgtDt = new DataTable();
                if (!srcBreak)
                {
                    string srcQuery = "select Id as LinkedColumn,CompareColumn from Source order by LinkedColumn" +
                                        " OFFSET ((" + srcCurCount + " - 1) * " + srcChunkSize + " ) ROWS FETCH NEXT " + srcChunkSize + " ROWS ONLY;";
                    da1.SelectCommand.CommandText = srcQuery;
                    srcDt = GetDatatable(da1);
                }
                if (!tgtBreak)
                {
                    string tgtQuery = "select Id as LinkedColumn,CompareColumn from Target order by LinkedColumn" +
                                        " OFFSET ((" + tgtCurCount + " - 1) * " + tgtChunkSize + " ) ROWS FETCH NEXT " + tgtChunkSize + " ROWS ONLY;";
                    da2.SelectCommand.CommandText = tgtQuery;
                    tgtDt = GetDatatable(da2);
                }

                if (srcDt.Rows.Count == 0) srcBreak = true;
                srcCurCount++;

                if (tgtDt.Rows.Count == 0) tgtBreak = true;
                tgtCurCount++;

                if (srcBreak && tgtBreak) break;
            }
            stopwatch.Stop();
            string a = stopwatch.Elapsed.ToString(@"d\.hh\:mm\:ss");
            Console.WriteLine(a);
        }

        private DataTable GetDatatable(SqlDataAdapter da)
        {
            DataTable dt = new DataTable();
            da.Fill(dt);
            return dt;
        }
I Love Stackoverflow
  • 6,738
  • 20
  • 97
  • 216
  • 1
    I don't see how reading whole thing into memory at once (what you are doing now) will put less load on database compared to reading one by one with datareader. – Evk Feb 15 '18 at 07:07
  • @Evk Problem with data reader is that when i do executereader the result set in maintained in database stream and read 1 by 1 so load is on database server till i have finished reading whole result set.Please correct me if i am wrong – I Love Stackoverflow Feb 15 '18 at 07:13
  • Your chunks of 1000 records are too small. Make sure you have indexes on these tables that covers the selected columns - Index should be on the `Id` column and include the `CompareColumn`. – Zohar Peled Feb 15 '18 at 07:41
  • @ZoharPeled It is taking way too much time though i have index on Id column.What chunk size do you recommend then? – I Love Stackoverflow Feb 15 '18 at 07:43
  • for 12.3 million records, I would suggest at least 100000 records as a buffer. Selecting 1000 records means you have to run your select statement 12300 times... – Zohar Peled Feb 15 '18 at 07:48
  • @ZoharPeled Previously i have set 100000 records as buffer size when then i was running 2 programs in parallel(that means 400000 on network buffer) i was getting transport layer error – I Love Stackoverflow Feb 15 '18 at 07:50
  • what about 10000 then? I've tested on SSMS the time it takes to select 1,000,000 records from a table with 9.7 million records using `offset...fetch` with different buffer sizes: 1000 records, 10,000 records and 10,0000 records. The 1,000 records buffer didn't complete the select even after 9 minutes. The 10,000 records buffer completed the select in 129.5 seconds, and the 100,000 buffer completed in 128.8. (practically no difference between the last two). – Zohar Peled Feb 15 '18 at 09:26
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/165221/discussion-between-user-and-zohar-peled). – I Love Stackoverflow Feb 15 '18 at 12:34

2 Answers2

0

It's difficult to provide a good answer without knowing what you're ultimate goal is. For example, the best answer might be to perform your task in a stored procedure instead of in c#.

However, to simply speed up your process, you can use parallel programming. Here is a good starting point: MSDNs parallel processing page.

Also, consider taking a step back and look at your architecture. You may benefit from spinning up a graph database and doing a batch export\import out of Oracle. Graph databases are far more efficient at complex mass relational queries.

0

SqlDataReader:

Holds the connection open until we are finished (don’t forget to close it!). Can typically only be iterated over once Is not as useful for updating back to the database On the other hand, it: Only has one record in memory at a time rather than an entire result set (this can be huge) Is about as fast as we can get for that one iteration Allows us to start processing results sooner (once the first record is available)

SqlDataAdapter/DataSet

Lets us close the connection as soon as it’s done loading data, and may even close it for us automatically All of the results are available in memory We can iterate over it as many times as we need, or even look up a specific record by index Has some built-in faculties for updating back to the database At the cost of: Much higher memory use We wait until all the data is loaded before using any of it

you can see the original post here at: https://medium.com/exam-70-487/data-reader-vs-data-adapter-ede63a5f771e

thanks.

T.Ted
  • 33
  • 3