2

I have several text files that have lots of newlines between texts that I would like to normalize but there is no pattern to amount of newline between the texts for example:

Text




Some text








More text




More

more

So what I wanted to change where the amount of newline is bigger than X to Y so let's say, when there is 5 sequential newlines it becomes 2, 10 it becomes 3.

My currently problem is I don't know how I should go about identifying which lines I will have to normalize.

I know I could count the new lines using split, and other ways like verifying if the line is empty etc... But perhaps there is a simple regex or better approach to solve this problem ?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Guapo
  • 3,446
  • 9
  • 36
  • 63

2 Answers2

1
List<string> Normalize(string fileName, int size)
{
    List<string> result = new List<string>();
    int blanks = 0;

    foreach (var line in File.ReadAllLines(fileName))
    {
        if (line.Trim() == "")
        {
            if (blanks++ < size)
                result.Add("");
        }
        else
        {
            blanks = 0;
            results.Add(line);
        }
    }
    return line;
}
Bob Webster
  • 153
  • 7
0

Here's one way

string sText = File.ReadAllText(@"c:\file.txt");
sText = removeLines(sText);

public string removeLines(string sData) {
            string[] sArray = sData.Split(sDelim, 
                                          StringSplitOptions.RemoveEmptyEntries);
            StringBuilder builder = new StringBuilder();
            foreach (string value in sArray)
            {
                builder.Append(value);
                builder.Append("\r\n");
            }
            return builder.ToString();
}

Or a one-liner using Regular expressions:

string sText = File.ReadAllText(@"c:\file.txt");
sText = Regex.Replace(sText, "[\r\n]+", "\n");
Shai
  • 25,159
  • 9
  • 44
  • 67
  • they are not `\r\n` they are simple sequential `\n` and I wanted to remove only when they exceed an certain sequential amount of new lines like i mentioned, I am not pro with regex but would doing something like `[\n]+{,3}` work out ? – Guapo May 14 '12 at 05:01