1

I am trying to read a text file line by line and create one line from multiple lines until the line read in has \r\n at the end. My data looks like this:

BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII\n
State Lic. #40428210000   City Lic.#4042821P\n
9/26/14      9/14/14 - 9/13/15    $175.00\n
9/20/00    9/14/00 - 9/13/01    $575.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638\n
State Lic. #24111110126; City Lic. #2411111126P\n
SEND ISSUED LICENSES TO DALLAS, TX\r\n

I want the data to look like this:

BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII State Lic. #40428210000   City Lic.#4042821P 9/26/14      9/14/14 - 9/13/15    $175.00 9/20/00    9/14/00 - 9/13/01    $575.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638 State Lic. #24111110126; City Lic. #2411111126P SEND ISSUED LICENSES TO DALLAS, TX\r\n

My code is like this:

FileStream fsFileStream = new FileStream(strInputFileName, FileMode.Open, 
FileAccess.Read, FileShare.ReadWrite);

using (StreamReader srStreamRdr = new StreamReader(fsFileStream))
{
    while ((strDataLine = srStreamRdr.ReadLine()) != null && !blnEndOfFile)
    {
        //code evaluation here
    }

I have tried:

if (strDataLine.EndsWith(Environment.NewLine))
{
    blnEndOfLine = true;
}

and

if (strDataLine.Contains(Environment.NewLine))
{
    blnEndOfLine = true;
}

These do not see anything at the end of the string variable. Is there a way for me to tell the true end of line so I can combine these rows into one row? Should I be reading the file differently?

Steve
  • 213,761
  • 22
  • 232
  • 286
Cass
  • 537
  • 1
  • 7
  • 24

3 Answers3

0

If what you have posted is exactly whats in the file. Meaning the \r\n are indeed written, you can use the following to unescape them:

strDataLine.Replace("\\r", "\r").Replace("\\n", "\n");

this will ensure you can now use Environment.NewLine in order to do your comparison as in:

if (strDataLine.Replace("\\r", "\r").Replace("\\n", "\n").EndsWith(Environment.NewLine))
{
    blnEndOfLine = true;
}
StfBln
  • 1,137
  • 6
  • 11
0

You cannot use the ReadLine method of the StringReader because every kind of newline. both the \r\n and \n are removed from the input, a line is returned by the reader and you will never know if the characters removed are \r\n or just \n

If the file is not really big then you can try to load everything in memory and do the splitting yourself into separate lines

// Load everything in memory
string fileData = File.ReadAllText(@"D:\temp\myData.txt");

// Split on the \r\n (I don't use Environment.NewLine because it 
// respects the OS conventions and this could be wrong in this context
string[] lines = fileData.Split(new string[] { "\r\n"}, StringSplitOptions.RemoveEmptyEntries);

// Now replace the remaining \n with a space 
lines = lines.Select(x => x.Replace("\n", " ")).ToArray();

foreach(string s in lines)
   Console.WriteLine(s);

EDIT
If your file is really big (like you say 3.5GB) then you cannot load everything in memory but you need to process it in blocks. Fortunately the StreamReader provides a method called ReadBlock that allows us to implement code like this

// Where we store the lines loaded from file
List<string> lines = new List<string>();

// Read a block of 10MB
char[] buffer = new char[1024 * 1024 * 10];
bool lastBlock = false;
string leftOver = string.Empty;

// Start the streamreader
using (StreamReader reader = new StreamReader(@"D:\temp\localtext.txt"))
{
    // We exit when the last block is reached
    while (!lastBlock)
    {
        // Read 10MB
        int loaded = reader.ReadBlock(buffer, 0, buffer.Length);

        // Exit if we have no more blocks to read (EOF)
        if(loaded == 0) break;

        // if we get less bytes than the block size then 
        // we are on the last block 
        lastBlock = (loaded != buffer.Length);

        // Create the string from the buffer
        string temp = new string(buffer, 0, loaded);

        // prepare the working string adding the remainder from the 
        // previous loop
        string current = leftOver + temp;

        // Search the last \r\n
        int lastNewLinePos = temp.LastIndexOf("\r\n");

        if (lastNewLinePos > -1)
        {
             // Prepare the working string
             current = leftOver + temp.Substring(0, lastNewLinePos + 2);

             // Save the incomplete parts for the next loop
             leftOver = temp.Substring(lastNewLinePos + 2);
        }
        // Process the lines
        AddLines(current, lines);
    }
}

void AddLines(string current, List<string> lines)
{
    var splitted = current.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
    lines.AddRange(splitted.Select(x => x.Replace("\n", " ")).ToList());
}

This code assumes that your file always ends with a \r\n and that you always get a \r\n inside a block of 10MB of text. More tests are needed with your actual data.

Steve
  • 213,761
  • 22
  • 232
  • 286
  • This works great for the file I am currently using! Thank you. Do you know what the file size limitations would be? We can have some rather big files, like 3.5 gig. Any ideas on how to do this on big files? – Cass Mar 11 '17 at 19:44
  • That's too big to load with File.ReadAllText. At this point you need some specialized code that loads a chunk of that file in memory, processes the lines as explained above and restart for the next chunk. – Steve Mar 11 '17 at 20:06
  • For the ideal size, a lot depends on how much memory you have to use. I would stay on blocks of 100MB at time – Steve Mar 11 '17 at 20:08
  • This process will read and write the file however the \n and \r\n are not found in the string. I believe they are lost when using StreamReader. – Cass Mar 12 '17 at 02:18
  • Yes, they are no more in the returned strings. \n is replaced by code, \r\n are removed when you split. However if you need to reintroduce the \r\n (for example if you want to display the lines in some kind of textbox multiline) then you can use _string.Join("\r\n", lines);_ – Steve Mar 12 '17 at 07:55
  • Your code that uses the split command works great for small files. But for the big files if i use StreamReader I have no way to tel if the read line ends in \n or \r\n. Those do bot exist in the line returned from StreamReader. Looking at my data above lines 2,3,4 and 5 need to become 1 row. How can I do that with StreamReader? – Cass Mar 12 '17 at 18:22
  • I don't use ReadLine, but ReadBlock. This methods load a block of bytes without looking at single lines. I work on this block to find the final \r\n, Everything before the final \r\n is processed immediately to add found further \r\n to the lines list. Everything after the final \r\n is passed to the next loop as a prefix for the next block and so on until I reach the end of the file – Steve Mar 12 '17 at 20:09
  • Does the ReadBlock retain the \n and \r\n in the data? I thought I tried this method but did not get the \r\n. – Cass Mar 12 '17 at 20:16
  • ReadBlock doesn't look at the content of the buffer loaded. It returns everything as bytes, so no it doesn't remove the \r\n from your data. Can you post a download link of a sample of your file? – Steve Mar 12 '17 at 20:36
  • How do I post a download of the file> Never done theat – Cass Mar 12 '17 at 22:30
  • You should get a free account on DropBox, Google Drive or Microsoft One Drive and then upload your file there. After that you will be able to give permissions to the uploaded file and take a shortcut link that you can post here – Steve Mar 12 '17 at 23:33
  • My apologies, I did not understand your code so I did not use it correctly. I just recopied your code and only changed it to write out a file. It worked perfectly. Tank you so much! Sorry for being a pain. – Cass Mar 13 '17 at 00:50
0

You can just read all text by calling File.ReadAllText(path) and parse it in following way :

            string input =  File.ReadAllText(your_file_path);
            string output = string.Empty;
            input.Split(new[] { Environment.NewLine } , StringSplitOptions.RemoveEmptyEntries).
                Skip(1).ToList().
                ForEach(x =>
                {
                    output += x.EndsWith("\\r\\n") ? x + Environment.NewLine 
                                                   : x.Replace("\\n"," ");
                });
Akash KC
  • 16,057
  • 6
  • 39
  • 59