-3

I have some files in CSV format, delimited with pipe.

I need every row of the file to be in one single line, but I have some files like this:

0001|Some text|Some longer text that \r\n has new lines on it|1234

So, this means that the first row is now in two lines.

I'm talking about 5000 rows files that now have around 12000 lines. The Columns I need to replace are always the second and the third ones, the first and the forth are always numbers.

I need to replace the \r\n in some columns of the file for just a space, thus rows take only one line.

How can I do this in Microsoft C#?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
sant016
  • 195
  • 1
  • 2
  • 14
  • 3
    which is the file encoding? – NicoRiff Jan 05 '17 at 14:19
  • how many lines you see if you open with a text editor like notepad? – NicoRiff Jan 05 '17 at 14:20
  • Possible duplicate of http://stackoverflow.com/questions/4140723/how-to-remove-new-line-characters-from-a-string – Joe Jan 05 '17 at 14:22
  • 3
    \n is only newline, on Windows a true newline is \r\n, so if really the line has \n if you read it with an StringReader you will get these in only one line, so you can read line by line, use Replace("\n", "") and write it to another file. – Gusman Jan 05 '17 at 14:22
  • 1
    Is there always a pipe at the beginning of a new data line? – Zohar Peled Jan 05 '17 at 14:25
  • 1
    Is the first "line" really in two different lines or is just the string `\n` in it? – Tim Schmelter Jan 05 '17 at 14:33
  • @NicoRiff Encoding is UTF8. An dI see around 12000 lines. – sant016 Jan 05 '17 at 14:34
  • @Gusman you're right. This is Windows, so it's \r\n. – sant016 Jan 05 '17 at 14:34
  • @ZoharPeled now corrected – sant016 Jan 05 '17 at 14:35
  • @TimSchmelter yes, it is in two lines – sant016 Jan 05 '17 at 14:35
  • @Joe the difference is that I *cannot* replace the new line character at the end of the row, only in the columns on it. – sant016 Jan 05 '17 at 14:38
  • 1
    is the string always the same long in terms of "columns"?. I mean, if you split by the pipe char will you get always the same number of strings? – NicoRiff Jan 05 '17 at 14:40
  • @NicoRiff If I got it right, you're asking for the columns number. Yes, it is the same. – sant016 Jan 05 '17 at 14:49
  • How do you know it is the end of a row? **When you ask a question, you should provide all necessary information.** You have to tell us if you always have 4 fields delimited by a | and the first and last one are always numbers. – Phil1970 Jan 05 '17 at 14:51
  • 2
    Open the file with File.ReadAllText() into a string. If the \r\n is always on your third column. Then you can split the string by pipes, and the in a for statement increment always by 3 and remove the \r\n of that string. If it is right you will always work with your "3rd column" – NicoRiff Jan 05 '17 at 14:54
  • @Phil1970 I'm sorry. Now edited. – sant016 Jan 05 '17 at 14:58
  • @NicoRiff And how do I write it back to the file so it overwrites? – sant016 Jan 05 '17 at 14:59
  • when you have your string[] perfecly clean the you do a String.Join("",yourStringArray); – NicoRiff Jan 05 '17 at 15:01
  • and then save again – NicoRiff Jan 05 '17 at 15:02
  • Personally, I would vote for using an existing, ready-to-use, well-tested CSV parser library (like e.g. [this one](http://www.filehelpers.net/)) instead of reinventing the wheel (which obivously will be erroneous). – Uwe Keim Jan 05 '17 at 15:03
  • @UweKeim: The trouble is the CSV file is wrong. A CSV parser library will consider each line to be a new record which is exactly the problem that the OP is having. What would have fixed it would have been to use a proper CSV library to generate the file in the first place but I assume that ship has already sailed. – Chris Jan 05 '17 at 15:11
  • @sant016 you will have to use a StreamWriter to do the work. – NicoRiff Jan 05 '17 at 15:15

2 Answers2

2

If the newlines you want to replace are always in the third column, you can do the following: split the string, replace the newline in every third column, then rejoin the string:

string text = File.ReadAllText(@"C:\users\sjors\desktop\in.txt");

string[] values = text.Split('|');

StringBuilder SB = new StringBuilder();

for(int i = 0; i < values.Length; i++)
{
    if ( (i + 1) % 3 == 0 )
        values[i] = values[i].Replace("\r\n", " ");

    SB.Append(values[i] + "|");
}

// Trim end to remove the trailing |
File.WriteAllText(@"C:\users\sjors\desktop\out.txt", SB.ToString().TrimEnd('|'));
Sjors Ottjes
  • 1,067
  • 2
  • 12
  • 17
  • Turns out i tested it wrong, ill update it in a second. edit: fixed – Sjors Ottjes Jan 05 '17 at 15:02
  • What about if maybe it'd be forth or other colum? (I have more files with the same problem) – sant016 Jan 05 '17 at 15:20
  • @sant016 You can do this same trick with every column, just change the `(i + 1) % 3` line. The % is a modulus operator. If the last column on a line has extra newlines you can replace all of them and then add an extra `\r\n` at the end of that column – Sjors Ottjes Jan 05 '17 at 15:22
  • @SjorsOttjes Everything worked perfect, the only problem was that `(i+1) % 3` was not working well, and this is because there are always four columns, so if you want to move between columns, you need to `(i+2) % 4`. Thank you so much! – sant016 Jan 06 '17 at 20:07
2

You can try this simple code:

        string so = System.IO.File.ReadAllText(@"C:/yourPath/yourOldCSV.CSV");

        string[] arr = so.Split('|');

        //To check if it is on 3rd column
        for (int i = 2; i < arr.Length; i = i + 3)
        {
            arr[i] = arr[i].Replace("\r\n", "");
        }

        string res = String.Join("", arr);

        System.IO.File.WriteAllText("C:/yourPath/yourNewCSV.CSV", res);
NicoRiff
  • 4,803
  • 3
  • 25
  • 54
  • The only problem is that I need every register in one line, and this way registers will mix at last column with the first one. Thank you! – sant016 Jan 06 '17 at 20:08
  • it shouldn´t. since last column isn´t edited and your "\r\n" remains – NicoRiff Jan 06 '17 at 20:11
  • If you split by `|`, eventually you will get the last column joined with the first one. If you replace `\r\n` for nothing then first column and last one will be mixed. – sant016 Jan 06 '17 at 21:15
  • You're right, I didn't see the increment on the `for`. Thanks, – sant016 Jan 10 '17 at 13:03