5

There has to be a better way to do this. I just want to split long string into 60 character lines but do not break words. So it doesn't have to add up to 60 characters just has to be less than 60.

The code below is what I have and it works but I'm thinking there's a better way. Anybody?

Modified to use StringBuilder and fixed the problem of removing a repeating word. Also don't want to use regex because I think that would be less efficient than what I have now.

public static List<String> FormatMe(String Message)
{
    Int32 MAX_WIDTH = 60;
    List<String> Line = new List<String>();
    String[] Words;

    Message = Message.Trim();
    Words = Message.Split(" ".ToCharArray());

    StringBuilder s = new StringBuilder();
    foreach (String Word in Words)
    {
        s.Append(Word + " ");
        if (s.Length > MAX_WIDTH)
        {
            s.Replace(Word, "", 0, s.Length - Word.Length);
            Line.Add(s.ToString().Trim());
            s = new StringBuilder(Word + " ");
        }
    }

    if (s.Length > 0)
        Line.Add(s.ToString().Trim());

    return Line;
}

Thanks

Dmitrii Lobanov
  • 4,897
  • 1
  • 33
  • 50
Tigran
  • 872
  • 1
  • 9
  • 25
  • Are you looking for a more efficient algorithm or a more next coder to read this friendly method? – Nate Nov 05 '09 at 03:04
  • Any reason you are not using Generics? – Nate Nov 05 '09 at 03:07
  • 1
    1. your code does not work as you expect. s.Replace(Word,"") will replace not only the last one but also any partial match of Word in the string. 2. s+=... You will end up creating too many temp string objects as strings are immutable in C#. Try to use stringbuilder or string.Join() method. – Chansik Im Nov 05 '09 at 04:31
  • I'm looking for something that's more effician. Chansik Im thanks for pointing that out. I fixed it. And decided to use string builder while I'm at it. – Tigran Nov 13 '09 at 01:43
  • How would I use Generics to make this better? – Tigran Nov 13 '09 at 01:44

7 Answers7

7

Another (now TESTED) sample, very similiar to Keith approach:

static void Main(string[] args)
{
    const Int32 MAX_WIDTH = 60;

    int offset = 0;
    string text = Regex.Replace(File.ReadAllText("oneline.txt"), @"\s{2,}", " ");
    List<string> lines = new List<string>();
    while (offset < text.Length)
    {
        int index = text.LastIndexOf(" ", 
                         Math.Min(text.Length, offset + MAX_WIDTH));
        string line = text.Substring(offset,
            (index - offset <= 0 ? text.Length : index) - offset );
        offset += line.Length + 1;
        lines.Add(line);
    }
}

I ran that on this file with all line breaks manually replaced by " ".

Community
  • 1
  • 1
Rubens Farias
  • 57,174
  • 8
  • 131
  • 162
  • String.LastIndexOf(string value, int startIndex). How do you guarantee text.Substring(offset, text.LastIndexOf(" ", MAX_WIDTH)) returns the string less than 60 characters? – Chansik Im Nov 05 '09 at 05:59
  • I changed for LastIndexOf(" ", offset+MAX_WIDTH); now should work (still untested) – Rubens Farias Nov 05 '09 at 08:58
  • @Rubens, the change does not guarantee to return the string less than 60 chars because you did not change 'text'. Suppose you have 10K string 'text' that ends with " ", text.LastIndexOf(" ", offset + MAX_WIDTH) will return the index of the last " ", not within 60 char limit. – Chansik Im Nov 05 '09 at 17:39
  • @Chansik Im, now I tested it; please, take a look again; ty – Rubens Farias Nov 05 '09 at 18:29
  • @Rubens, I just grabbed the html source of this web page and ran your code on it. Your method seemed to work fine until it hits a http link longer than 60 chars (hopefully, OP is not expecting hyperlinks there :) + OP does not clearly specify how to handle this situation either :) Incomplete requirement :)) . Also, even if the last word fits in 60 char limit, you may exclude the last word from the line as you are seeking the last index of " ". BTW, I think this is a decent solution, especially with large string object. – Chansik Im Nov 06 '09 at 00:33
  • @Rubens, I was wrong about String.LastIndexOf() in my previous comment. Sorry :) – Chansik Im Nov 06 '09 at 00:37
  • @Chansik, yw! ty for all your feedbacks! – Rubens Farias Nov 06 '09 at 02:07
  • Your latest code still has the problem if the string has a word that has the number of character is more than MAX_WIDTH. For example, try to set MAX_WIDTH = 10 then use this string "The code snippet below illustrates the solution discussed above". The last line will has the lenght longer than the MAX_WIDTH – dalmate May 19 '22 at 18:29
1

Try this:

const Int32 MAX_WIDTH = 60;

string text = "...";
List<string> lines = new List<string>();
StringBuilder line = new StringBuilder();
foreach(Match word in Regex.Matches(text, @"\S+", RegexOptions.ECMAScript))
{
    if (word.Value.Length + line.Length + 1 > MAX_WIDTH)
    {
        lines.Add(line.ToString());
        line.Length = 0;
    }
    line.Append(String.Format("{0} ", word.Value));
}

if (line.Length > 0)
    line.Append(word.Value);

Please, also check this out: How do I use a regular expression to add linefeeds?

Community
  • 1
  • 1
Rubens Farias
  • 57,174
  • 8
  • 131
  • 162
1

Inside a Regular expression, the Match Evaluator function (an anonymous method) does the grunt work and stores the newly sized lines into a StringBuilder. We don't use the return value of Regex.Replace method because we're just using its Match Evaluator function as a feature to accomplish line breaking from inside the regular expression call - just for the heck of it, because I think it's cool.

using System;
using System.Text;
using System.Text.RegularExpressions;

strInput is what you want to convert the lines of.

int MAX_LEN = 60;
StringBuilder sb = new StringBuilder();
int bmark = 0; //bookmark position

Regex.Replace(strInput, @".*?\b\w+\b.*?", 
    delegate(Match m) {
        if (m.Index - bmark + m.Length + m.NextMatch().Length > MAX_LEN 
                || m.Index == bmark && m.Length >= MAX_LEN) {
            sb.Append(strInput.Substring(bmark, m.Index - bmark + m.Length).Trim() + Environment.NewLine);
            bmark = m.Index + m.Length;
        } return null;
    }, RegexOptions.Singleline);

if (bmark != strInput.Length) // last portion
    sb.Append(strInput.Substring(bmark));

string strModified = sb.ToString(); // get the real string from builder

It's also worth noting the second condition in the if expression in the Match Evaluator m.Index == bmark && m.Length >= MAX_LEN is meant as an exceptional condition in case there is a word longer than 60 chars (or longer than the set max length) - it will not be broken down here but just stored on one line by itself - I guess you might want to create a second formula for that condition in the real world to hyphenate it or something.

John K
  • 28,441
  • 31
  • 139
  • 229
  • The regular expression might need tweaking to handle punctuation correctly, but I'm done with this example and moving on - it's flexible and provides the gist of another way to accomplish line breaking - the core logic (inside the anonymous method) is only a few lines which is shorter than the core logic the author posted so I think this is a viable answer. It's all the supporting stuff around it that makes it larger. – John K Nov 05 '09 at 04:54
  • You can just as easily substitute List.Add(..) in place of StringBuilder.Append(..) if you want to capture each text line into a collection element or array element. – John K Nov 05 '09 at 05:08
1

An other one ...

public static string SplitLongWords(string text, int maxWordLength)
{
    var reg = new Regex(@"\S{" + (maxWordLength + 1) + ",}");
    bool replaced;
    do
    {
        replaced = false;
        text = reg.Replace(text, (m) =>
        {
            replaced = true;
            return m.Value.Insert(maxWordLength, " ");                    
        });
    } while (replaced);

    return text;
}
Dmitrii Lobanov
  • 4,897
  • 1
  • 33
  • 50
nevaram
  • 11
  • 1
0

I would start with saving the length of the original string. Then, start backwards, and just subtract, since odds are that I would get below 60 faster by starting at the last word and going back than building up.

Once I know how long, then just use StringBuilder and build up the string for the new string.

James Black
  • 41,583
  • 10
  • 86
  • 166
0
List<string> lines = new List<string>();
while (message.Length > 60) {
  int idx = message.LastIndexOf(' ', 60);
  lines.Add(message.Substring(0, idx));
  message = message.Substring(idx + 1, message.Length - (idx + 1));
}
lines.Add(message);

You might need to modify a bit to handle multiple spaces, words with >60 chars in them, ...

Keith Randall
  • 22,985
  • 2
  • 35
  • 54
  • message = message.Substring(idx + 1, message.Length - (idx + 1)) <-- may create too many temp strings, especially the original message is large. Not highly likely, but, imagine the original string size is 86KB, which is large enough to be allocated into a large object heap and you are creating temp strings by subtracting around 60 chars only. Yes, it is too much extreme, though :) – Chansik Im Nov 05 '09 at 05:14
  • All depends on whether Substring actually makes a copy or whether it just keeps pointers into the parent string (I don't know what C# does in this regard). Rubens' answer avoids this problem by keeping a starting index instead of substringing explicitly. – Keith Randall Nov 05 '09 at 15:02
  • In C#, strings are immutable, thus I think Substring will return a new string object, instead of keeping a pointer of the parent string. Regarding Ruben's answer, I also thought the same way you do, but realized that LastIndexOf(char value, int startIndex) will return the last index of value char in the string while searching from startIndex, not up to the startIndex. Hence, neither yours nor Ruben's may not work as expected. – Chansik Im Nov 05 '09 at 17:51
  • Java has immutable strings and does what I described (see http://www.docjar.com/html/api/java/lang/String.java.html). You're wrong about LastIndexOf as well ("azzzza".LastIndexOf('a', 3) == 0). – Keith Randall Nov 05 '09 at 19:16
  • Yes.. I was wrong about LastIndexOf indeed. That's why I came to the post to update comment. But, you still get wrong about substring, even with Java. According to the link you provide, substring "..Returns a new string that is a substring of this string" in Java. – Chansik Im Nov 06 '09 at 00:07
  • It returns a new String, but it references the same char[] array as the original string. – Keith Randall Nov 06 '09 at 01:05
  • As such, substring runs in O(1) time and uses O(1) space, regardless of the size of the substring selected. – Keith Randall Nov 06 '09 at 01:06
  • Good to know Java handles it differently. Thanks for sharing your knowledge. Unfortunately, it seems not how C# implements String.Substring() (at least, as of now). See the following SO discussion @ http://stackoverflow.com/questions/1082442/why-does-net-create-new-substrings-instead-of-pointing-into-existing-strings – Chansik Im Nov 06 '09 at 03:06
0

I tried the original solution and found that it didn't quite work. I've modified it slightly to make it work. It now works for me and solves a problem I had. Thanks. Jim.

public static List<String> FormatMe(String message)
    {
        int maxLength = 10;
        List<String> Line = new List<String>();
        String[] words;

        message = message.Trim();
        words = message.Split(" ".ToCharArray());

        StringBuilder sentence = new StringBuilder();
        foreach (String word in words)
        {
            if((sentence.Length + word.Length) <= maxLength)
            {
                sentence.Append(word + " ");

            }
            else
            {
                Line.Add(sentence.ToString().Trim());
                sentence = new StringBuilder(word + " ");
            }
        }

        if (sentence.Length > 0)
            Line.Add(sentence.ToString().Trim());

        return Line;
    }

    private void btnSplitText_Click(object sender, EventArgs e)
    {
        List<String> Line = new List<string>();
        string message = "The quick brown fox jumps over the lazy dog.";
        Line = FormatMe(message);
    }