-1

I need read a text file (10mb) and convert to .csv. See below portion of code:

string DirPathForm = System.IO.Path.GetDirectoryName(System.Reflection.Assembly.GetEntryAssembly().Location);'
string[] lines = File.ReadAllLines(DirPathForm + @"\file.txt");

Some portion of the text file have a pattern. So, used as below:

string[] lines1 = lines.Select(x => x.Replace("abc[", "ab,")).ToArray();
Array.Clear(lines, 0, lines.Length);
lines = lines1.Select(x => x.Replace("] CDE  ", ",")).ToArray();

Some portion does not have a pattern to use directly Replace. The question is how remove the characters, numbers and whitespaces in this portion. Please see below?

string[] lines = {
    "a]  773  b",
    "e] 1597  t",
    "z]    0  c"
};

to get the result below:

string[] result = {
    "a,b",
    "e,t",
    "z,c"
};

obs: the items removed need be replaced by ",".

rcSilva
  • 1
  • 1
  • 2
    This can easily be achieved by using regex – Mocas Mar 29 '20 at 23:05
  • `ReadAllLines` is an extremely inefficient way of reading a large file. There is code to parse a large file in one pass without buffering at [Extremely Large File Parse](https://stackoverflow.com/questions/26247952/). The pattern that answer is looking for is different from what you are looking for, but it can be easily changed. – Dour High Arch Mar 29 '20 at 23:16
  • Thanks! I am still testing but the regex @"\]+\s+([0-9]|\s)+([0-9]|\s) is working – rcSilva Mar 30 '20 at 02:03

1 Answers1

0

First of all, you should not use ReadAllLines since it is a huge file operation. It will load all the data into RAM and it is not correct. Instead, read the lines one by one in a loop.

Secondly, you can definitely use regex to replace data from the first condition to the second one.