0

I have a table that has millions of rows. It has logging data. I want to move the data to text files. Each day's worth of data should go into its own text file. I'm in .net environment. What is the efficient way to achieve it ?

I want to use parallel processing because we have beefy servers with many cores. Some choices I can think of are :

  1. Have parallel data readers. Each reader queries a portion of the data. How do I manage the total connections with this approach ? Also if I went this route, I will have to not disrupt the normal usage for the users. The other problem I can see with this approach is managing my own threads and setting an upper limit, whereas Parallel.ForEach would be much simpler.

  2. Producer-consumer pattern: One thread reads the data and queues it in memory. Multiple writers consume the data from memory and write it out to text files.

I'm open to PetaPoco/NPoco. Ideally I want to use Parallel.ForEach without complicating the threading code too much.

MD Luffy
  • 536
  • 6
  • 18
  • 1
    This sounds like a job for BCP (https://msdn.microsoft.com/en-us/library/ms162802(v=sql.130).aspx). You can have a SQL agent job that exports the logs on a daily basis. BCP is very efficient at exporting data and you can use queries to select the data to export. You can also consider adding Table Partitioning into the mix. Here is [one example](https://www.mssqltips.com/sqlservertip/2780/archiving-sql-server-data-using-partitioning/) of what can be done with partitions. – PHeiberg Aug 19 '16 at 20:43

1 Answers1

0

Parallel processing helps when there is a lot of computing involved. However, here, you have mainly I/O involved. Harddisks can only write to one file at a time. So multithreading will not bring the hoped-speed growth. It could, in contrary, reduce speed, since the harddisk could be forced to move back and fourth when writing to the different files.

Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188