7

Does anyone know a good algorithm to word wrap an input string to a specified number of lines rather than a set width. Basically to achieve the minimum width for X lines.

e.g. "I would like to be wrapped into two lines"
goes to
"I would like to be
wrapped into two lines"

"I would like to be wrapped into three lines"
goes to
"I would like to
be wrapped into
three lines"

Inserting new lines as required. I can find other word wrap questions but they all have a known width and want to insert as many lines as needed to fit that width. I am after the opposite.

Answers preferable in a .NET language but any language would be helpful. Obviously if there is a framework way to do this I am not aware of let me know.

Edit I have found this since which I think the accepted answer is the solution to my problem but am having difficulty understanding it. Algorithm to divide text into 3 evenly-sized groups any chance someone could convert it to c# or vb.net.

Community
  • 1
  • 1
PeteT
  • 18,754
  • 26
  • 95
  • 132
  • 1
    Are you after an optimal solution or is a near optimal solution good enogh? And is hyphenation allowed? – Timo Jun 21 '11 at 14:03
  • Ideally optimal but I would welcome a near optimal answer. I can see a greedy way to do it involving an array of words and then the total length divided by number of lines as a value to break on. However I can see it won't always be the best which may be what you are also thinking of. I am trying to avoid hyphenation. – PeteT Jun 21 '11 at 15:50
  • Assuming the line is 30 characters long, and shall cover 3 lines, an optimal solution would be 3 lines of length 10. We could now, for example search for the solution, with the highest number of lines of length 10, or for the solution with the minimum difference from 10. Better shown with 100 characters and 10 lines: Is 10, 10, 10, 10 ..., 10, 19, 1 a good solution, because it is 8 times of 10 perfect, compared to the sample below, which is never correct, or is a solution 11, 9, 11, 9, ... 11, 9 better, because the maximum difference is 1, compared to 9 in the above example? – user unknown Jun 21 '11 at 23:01
  • Ah maybe I made an assumption. Yes the favoured solution would be 11, 9, 11, 9... the idea being to keep the longest line to a minimum – PeteT Jun 21 '11 at 23:16
  • I converted the python solution to C#. Look my answer. It was fun :) – Petar Ivanov Jun 24 '11 at 10:12
  • I converted the code to Swift. See here: https://stackoverflow.com/questions/5059956/algorithm-to-divide-text-into-3-evenly-sized-groups?nah=1#28822505 – Wizard of Kneup Jan 21 '18 at 05:07

8 Answers8

6

A way of solvng this problem would be using dynamic programming, You can solve this problem using dynamic programming, cf Minimum raggedness algorithm. I used some of the informations you add when you eddited your post with : Algorithm to divide text into 3 evenly-sized groups


Notations:

Let name your text document="word1 word2 .... wordp"

n= number of line required

LineWidth=len(document)/n


Cost function:

First you need to define a cost function of having word[i] to word[j] in the same line , you can take the same as the one as the one on wikipedia, with p=2 for example:

cost function

It represent the distance between the objective length of a line and the actual lenght.

The total cost function for the optimal solution can be defined with the following recursiion relation:

enter image description here


Solving the problem:

You can solve this problem using dynamic programming. I took the code from the link you gave, and changed it a so you see what the program is using.

  1. At stage k you add words to line k.
  2. Then you look at the optimal cost of having word i to j at line k.
  3. Once you've gone from line 1 to n, you tacke the smallest cost in the last step and you have your optimal result:

Here is the result from the code:

D=minragged('Just testing to see how this works.')

number of words: 7
------------------------------------
stage : 0
------------------------------------
word i to j in line 0       TotalCost (f(j))
------------------------------------
i= 0 j= 0           121.0
i= 0 j= 1           49.0
i= 0 j= 2           1.0
i= 0 j= 3           16.0
i= 0 j= 4           64.0
i= 0 j= 5           144.0
i= 0 j= 6           289.0
i= 0 j= 7           576.0
------------------------------------
stage : 1
------------------------------------
word i to j in line 1       TotalCost (f(j))
------------------------------------
i= 0 j= 0           242.0
i= 0 j= 1           170.0
i= 0 j= 2           122.0
i= 0 j= 3           137.0
i= 0 j= 4           185.0
i= 0 j= 5           265.0
i= 0 j= 6           410.0
i= 0 j= 7           697.0
i= 1 j= 2           65.0
i= 1 j= 3           50.0
i= 1 j= 4           58.0
i= 1 j= 5           98.0
i= 1 j= 6           193.0
i= 1 j= 7           410.0
i= 2 j= 4           26.0
i= 2 j= 5           2.0
i= 2 j= 6           17.0
i= 2 j= 7           122.0
i= 3 j= 7           80.0
------------------------------------
stage : 2
------------------------------------
word i to j in line 2       TotalCost (f(j))
------------------------------------
i= 0 j= 7           818.0
i= 1 j= 7           531.0
i= 2 j= 7           186.0
i= 3 j= 7           114.0
i= 4 j= 7           42.0
i= 5 j= 7           2.0
reversing list
------------------------------------
Just testing        12
to see how      10
this works.         11
  • *There fore the best choice is to have words 5 to 7 in last line.(cf stage2)
  • then words 2 to 5 in second line (cf stage1)
  • then words 0 to 2 in first line (cf stage 0).*

Reverse this and you get:

Just testing          12
to see how          10
this works.          11

Here is the code to print the reasonning,(in python sorry I don't use C#...but I someone actually translated the code in C#) :

def minragged(text, n=3):


    P=2
    words = text.split()
    cumwordwidth = [0]
    # cumwordwidth[-1] is the last element
    for word in words:
        cumwordwidth.append(cumwordwidth[-1] + len(word))
    totalwidth = cumwordwidth[-1] + len(words) - 1  # len(words) - 1 spaces
    linewidth = float(totalwidth - (n - 1)) / float(n)  # n - 1 line breaks

    print "number of words:", len(words)
    def cost(i, j):
        """
        cost of a line words[i], ..., words[j - 1] (words[i:j])
        """
        actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i])
        return (linewidth - float(actuallinewidth)) ** P

    """
    printing the reasoning and reversing the return list
    """
    F={} # Total cost function

    for stage in range(n):
        print "------------------------------------"
        print "stage :",stage
        print "------------------------------------"
        print "word i to j in line",stage,"\t\tTotalCost (f(j))"
        print "------------------------------------"


        if stage==0:
            F[stage]=[]
            i=0
            for j in range(i,len(words)+1):
                print "i=",i,"j=",j,"\t\t\t",cost(i,j)
                F[stage].append([cost(i,j),0])
        elif stage==(n-1):
            F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
            for i in range(len(words)+1):
                    j=len(words)
                    if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]: #calculating min cost (cf f formula)
                        F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
                        F[stage][j][1]=i
                        print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]            
        else:
            F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
            for i in range(len(words)+1):
                for j in range(i,len(words)+1):
                    if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]:
                        F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
                        F[stage][j][1]=i
                        print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]

    print 'reversing list'
    print "------------------------------------"
    listWords=[]
    a=len(words)
    for k in xrange(n-1,0,-1):#reverse loop from n-1 to 1
        listWords.append(' '.join(words[F[k][a][1]:a]))
        a=F[k][a][1]
    listWords.append(' '.join(words[0:a]))
    listWords.reverse()

    for line in listWords:
        print line, '\t\t',len(line)

    return listWords
Community
  • 1
  • 1
Ricky Bobby
  • 7,490
  • 7
  • 46
  • 63
  • You don't need dynamic programming. You are just trying to get the minimal width for a fixed number of lines. Greedy together with a binary search is more than enough to resolve this issue. Look at either my answer or btilly's. – Mikola Jun 24 '11 at 19:46
  • You answer has a lower complexity than mine (my program is in O(n2)) and you're right dynamic programming is not the only way to do it. I tried to explain the code peteT put in his Edit, as it seams to be a problem. thanks for the advice I will have a more carefull look at btilly's answer and at yours. – Ricky Bobby Jun 24 '11 at 22:04
  • It is not working for: ``` >>> smallest_width(['a', 'b', 'c', 'dad', 'e', 'f'], 3) ['a b', 'c dad', 'e f'] >>> smallest_width(['a', 'b', 'cad', 'd', 'e', 'f'], 3) ['a b', 'cad', 'd e f'] # 9+9 = 18 ``` In latter case we shall wrap as: ``` ['a b', 'cad d', 'e f'] # 4+4 = 8 ``` – excitoon Sep 05 '21 at 10:06
5

Here is the accepted solution from Algorithm to divide text into 3 evenly-sized groups converted to C#:

static List<string> Minragged(string text, int n = 3)
{
    var words = text.Split();

    var cumwordwidth = new List<int>();
    cumwordwidth.Add(0);

    foreach (var word in words)
        cumwordwidth.Add(cumwordwidth[cumwordwidth.Count - 1] + word.Length);

    var totalwidth = cumwordwidth[cumwordwidth.Count - 1] + words.Length - 1;

    var linewidth = (double)(totalwidth - (n - 1)) / n;

    var cost = new Func<int, int, double>((i, j) =>
    {
        var actuallinewidth = Math.Max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i]);
        return (linewidth - actuallinewidth) * (linewidth - actuallinewidth);
    });

    var best = new List<List<Tuple<double, int>>>();

    var tmp = new List<Tuple<double, int>>();
    best.Add(tmp);
    tmp.Add(new Tuple<double, int>(0.0f, -1));
    foreach (var word in words)
        tmp.Add(new Tuple<double, int>(double.MaxValue, -1));

    for (int l = 1; l < n + 1; ++l)
    {
        tmp = new List<Tuple<double, int>>();
        best.Add(tmp);
        for (int j = 0; j < words.Length + 1; ++j)
        {
            var min = new Tuple<double, int>(best[l - 1][0].Item1 + cost(0, j), 0);
            for (int k = 0; k < j + 1; ++k)
            {
                var loc = best[l - 1][k].Item1 + cost(k, j);
                if (loc < min.Item1 || (loc == min.Item1 && k < min.Item2))
                    min = new Tuple<double, int>(loc, k);
            }
            tmp.Add(min);
        }
    }

    var lines = new List<string>();
    var b = words.Length;

    for (int l = n; l > 0; --l)
    {
        var a = best[l][b].Item2;
        lines.Add(string.Join(" ", words, a, b - a));
        b = a;
    }

    lines.Reverse();
    return lines;
}
Community
  • 1
  • 1
Petar Ivanov
  • 91,536
  • 11
  • 82
  • 95
  • Thanks for this reading python is difficult for me as I am not familiar with it yet. I may try to learn it. I am going to leave the bounty open over the weekend just in case anyone else wants to improve the list of answers but I will probably award it to you. – PeteT Jun 25 '11 at 18:37
  • Just as a note for anyone else using this in the future if your not on a .NET 4 project you can convert the Tuple to a KeyValuePair – PeteT Jun 25 '11 at 18:41
  • Instead of that foreach that doesn't actually use the item, just use this: tmp.AddRange(Enumerable.Repeat(new Tuple(double.MaxValue, -1), words.Length)); – Jamie Aug 10 '23 at 01:36
4

There was a discussion about this exact problem (though it was phrased in a different way) at http://www.perlmonks.org/?node_id=180276.

In the end the best solution was to do a binary search through all possible widths to find the smallest width that wound up with no more than the desired number of columns. If there are n items and the average width is m, then you'll need O(log(n) + log(m)) passes to find the right width, each of which takes O(n) time, for O(n * (log(n) + log(m))). This is probably fast enough with no more need to be clever.

If you wish to be clever, you can create an array of word counts, and cumulative lengths of the words. Then use binary searches on this data structure to figure out where the line breaks are. Creating this data structure is O(n), and it makes all of the passes to figure out the right width be O(log(n) * (log(n) + log(m))) which for reasonable lengths of words is dominated by your first O(n) pass.

If the widths of words can be floating point, you'll need to do something more clever with the binary searches, but you are unlikely to need that particular optimization.

btilly
  • 43,296
  • 3
  • 59
  • 88
4

btilly has the right answer here, but just for fun I decided to code up a solution in python:

def wrap_min_width(words, n):
    r, l = [], ""
    for w in words:
        if len(w) + len(l) > n:
            r, l = r + [l], ""
        l += (" " if len(l) > 0 else "") + w
    return r + [l]  

def min_lines(phrase, lines):
    words = phrase.split(" ")
    hi, lo = sum([ len(w) for w in words ]), min([len(w) for w in words])
    while lo < hi:
        mid = lo + (hi-lo)/2
        v = wrap_min_width(words, mid)
        if len(v) > lines:
            lo = mid + 1
        elif len(v) <= lines:
            hi = mid
    return lo, "\n".join(wrap_min_width(words, lo))

Now this still may not be exactly what you want, since if it is possible to wrap the words in fewer than n lines using the same line width, it instead returns the smallest number of lines encoding. (Of course you can always add extra empty lines, but it is a bit silly.) If I run it on your test case, here is what I get:

Case: "I would like to be wrapped into three lines", 3 lines

Result: 14 chars/line

I would like to

be wrapped into

three lines

Mikola
  • 9,176
  • 2
  • 34
  • 41
0

I converted the C# accepted answer to JavaScript for something I was working on. Posting it here might save someone a few minutes of doing it themselves.

function WrapTextWithLimit(text, n) {
    var words = text.toString().split(' ');
    var cumwordwidth = [0];
    words.forEach(function(word) {
        cumwordwidth.push(cumwordwidth[cumwordwidth.length - 1] + word.length);
    });
    var totalwidth = cumwordwidth[cumwordwidth.length - 1] + words.length - 1;
    var linewidth = (totalwidth - (n - 1.0)) / n;
    var cost = function(i, j) {
        var actuallinewidth = Math.max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i]);
        return (linewidth - actuallinewidth) * (linewidth - actuallinewidth);
    };
    var best = [];
    var tmp = [];
    best.push(tmp);
    tmp.push([0.0, -1]);
    words.forEach(function(word) {
        tmp.push([Number.MAX_VALUE, -1]);
    });
    for (var l = 1; l < n + 1; ++l)
    {
        tmp = [];
        best.push(tmp);
        for (var j = 0; j < words.length + 1; ++j)
        {
            var min = [best[l - 1][0][0] + cost(0, j), 0];
            for (var k = 0; k < j + 1; ++k)
            {
                var loc = best[l - 1][k][0] + cost(k, j);
                if (loc < min[0] || (loc === min[0] && k < min[1])) {
                    min = [loc, k];
                }
            }
            tmp.push(min);
        }
    }
    var lines = [];
    var b = words.length;
    for (var p = n; p > 0; --p) {
        var a = best[p][b][1];
        lines.push(words.slice(a, b).join(' '));
        b = a;
    }
    lines.reverse();
    return lines;
}
0

This solution improves on Mikola's.

It's better because

  1. It doesn't use strings. You don't need to use strings and concatenate them. You just need an array of their lengths. So, because of this it's faster, also you can use this method with any kind of "element" - you just need the widths.
  2. There was some unnecessary processing in the wrap_min_width function. It just kept going even when it went beyond the point of failure. Also, it just builds the string unnecessarily.
  3. Added the "separator width" as an adjustable parameter.
  4. It calculates the min width - which is really what you want.
  5. Fixed some bugs.

This is written in Javascript:

 // For testing calcMinWidth

var formatString = function (str, nLines) {

    var words = str.split(" ");
    var elWidths = words.map(function (s, i) {
        return s.length;
    });

    var width = calcMinWidth(elWidths, 1, nLines, 0.1);

    var format = function (width)
    {
        var lines = [];
        var curLine = null;
        var curLineLength = 0;

        for (var i = 0; i < words.length; ++i) {
            var word = words[i];
            var elWidth = elWidths[i];

            if (curLineLength + elWidth > width)
            {
                lines.push(curLine.join(" "));
                curLine = [word];
                curLineLength = elWidth;
                continue;
            }

            if (i === 0)
                curLine = [word];
            else
            {
                curLineLength += 1;
                curLine.push(word);
            }

            curLineLength += elWidth;
        }

        if (curLine !== null)
            lines.push(curLine.join(" "));

        return lines.join("\n");
    };

    return format(width);
};

var calcMinWidth = function (elWidths, separatorWidth, lines, tolerance)
{
    var testFit = function (width)
    {
        var nCurLine = 1;
        var curLineLength = 0;

        for (var i = 0; i < elWidths.length; ++i) {
            var elWidth = elWidths[i];

            if (curLineLength + elWidth > width)
            {
                if (elWidth > width)
                    return false;

                if (++nCurLine > lines)
                    return false;

                curLineLength = elWidth;
                continue;
            }

            if (i > 0)
                curLineLength += separatorWidth;

            curLineLength += elWidth;
        }

        return true;
    };


    var hi = 0;
    var lo = null;

    for (var i = 0; i < elWidths.length; ++i) {
        var elWidth = elWidths[i];

        if (i > 0)
            hi += separatorWidth;

        hi += elWidth;

        if (lo === null || elWidth > lo)
            lo = elWidth;
    }

    if (lo === null)
        lo = 0;

    while (hi - lo > tolerance)
    {
        var guess = (hi + lo) / 2;

        if (testFit(guess))
            hi = guess;
        else
            lo = guess;
    }

    return hi;
};
N73k
  • 527
  • 8
  • 20
0

I just thought of an approach:
You can write a function accepting two parameters 1. String 2. Number of lines

Get the length of the string (String.length if using C#). Divide the length by number of lines (lets say the result is n)

Now start a loop and access each character of the string (using string[i]) Insert a '\n\r' after every nth occurrence in the array of characters.

In the loop maintain a temp string array which would be null if there is a blank character(maintaining each word).
If there is a nth occurrence and temp string is not null then insert '\n\r' after that temp string.

R3D3vil
  • 681
  • 1
  • 9
  • 22
0

I'll assume you're trying to minimize the maximum width of a string with n breaks. This can be done in O(words(str)*n) time and space using dynamic programming or recursion with memoziation.

The recurrence would look like this where the word has been split in to words

def wordwrap(remaining_words, n):
    if n > 0 and len(remaining_words)==0:
        return INFINITY  #we havent chopped enough lines

    if n == 0:
        return len(remaining_words.join(' ')) # rest of the string

    best = INFINITY
    for i in range remaining_words:
        # split here 
        best = min( max(wordwrap( remaining_words[i+1:], n-1),remaining_words[:i].join(' ')), best  )  

    return best
dfb
  • 13,133
  • 2
  • 31
  • 52