0

I have to read many existing CSV files on a External Drive and combine the in Sequence (Sequencing is Critical) with restore point and write to output.csv on same External Drive in different path. Example A.CSV, B.CSV and so on to Output.csv , I am always appending to output.csv but there are high probability that IO operation fails. Like when writing B.CSV after A.CSV, if say B.CS has Character from A to Z, and IO exception happens after writing M, when I rerun the program , it should reprocess B.CSV and append O to Z to Output.csv. In my business case. output.csv going to be very big file in GBs though source file will be in 3-5 mbs max so do not want to reprocess it from start rather to restore writing where it fails. I am keeping the file names in Database Table and keeping updating the status as "Processing" and then Processed. Thanks and looking for your input.

using var fs = new FileStream(file, FileMode.Open, FileAccess.Read);
            using var reader = new StreamReader(fs, Encoding.Default);                
          (StreamWriter)  _filewriter.Write(Environment.NewLine + reader.ReadToEnd());
Caius Jard
  • 72,509
  • 5
  • 49
  • 80
TechiRA
  • 29
  • 3
  • 1
    Why would it fail? Why put so much effort to engineering a resume function, when modern hard disk failure is so unlikely and drives are so quick that just redoing the process over will be less time than putting extra effort into making a resume function? There are faster ways of appending files than the approach you've got there – Caius Jard Mar 01 '22 at 17:01
  • A few details that would be nice to cover in your question: why is it desirable to have one huge file instead of multiple smaller ones? Are the "input" files changing in-between "reprocessing" attempts? What have you tried so far? – Xerillio Mar 01 '22 at 17:01
  • the multiple csv files are being generated by Micros-Services, which independently consume data through Kafka topics and write and then another service merged them to drop for another ERP System to process. The ERP system process only single file for account for reporting purpose. – TechiRA Mar 01 '22 at 21:26
  • for Caius Jard anwers, the ERP system is legacy system which takes only one file for account, secondly, it takes hours to write say 6-8 GBS data and it fails at the tail, we need to run the process again and it may break any point so automated resume with restore points , I am thinking a good option or any other Idea , I am open for that too. – TechiRA Mar 01 '22 at 21:31

0 Answers0