11

After a few busy nights my head isn't working so well, but this needs to be fixed yesterday, so I'm asking the more refreshed community of SO.

I've got a series of numbers. For example:

1, 5, 7, 13, 3, 3, 4, 1, 8, 6, 6, 6

I need to split this series into three parts so the sum of the numbers in all parts is as close as possible. The order of the numbers needs to be maintained, so the first part must consist of the first X numbers, the second - of the next Y numbers, and the third - of whatever is left.

What would be the algorithm to do this?

(Note: the actual problem is to arrange text paragraphs of differing heights into three columns. Paragraphs must maintain order (of course) and they may not be split in half. The columns should be as equal of height as possible.)

Vilx-
  • 104,512
  • 87
  • 279
  • 422
  • Duplicate question? http://stackoverflow.com/questions/3009146/splitting-values-into-groups-evenly – kan Oct 13 '11 at 08:43
  • Close, but that one allows rearranging of values. I think my case should be simpler, but the algorithm mentioned there isn't useful here. – Vilx- Oct 13 '11 at 08:44
  • 1
    Three parts - is this the requirement, or only an example? – Lior Kogan Oct 13 '11 at 08:49
  • Requirement. But I think that a proper algorithm should be able handle an arbitrary number of columns. Still, if you must take 3 as a special value to get it done - by all means, do it. – Vilx- Oct 13 '11 at 08:57

5 Answers5

6

First, we'll need to define the goal better:

Suppose the partial sums are A1,A2,A3, We are trying to minimize |A-A1|+|A-A2|+|A-A3|. A is the average: A=(A1+A2+A3)/3.

Therefore, we are trying to minimize |A2+A3-2A1|+|A1+A3-2A2|+|A1+A2-2A3|.

Let S denote the sum (which is constant): S=A1+A2+A3, so A3=S-A1-A2.

We're trying to minimize:

|A2+S-A1-A2-2A1|+|A1+S-A1-A2-2A2|+|A1+A2-2S+2A1+2A2|=|S-3A1|+|S-3A2|+|3A1+SA2-2S|

Denoting this function as f, we can do two loops O(n^2) and keep track of the minimum:

Something like:

for (x=1; x<items; x++)
{
    A1= sum(Item[0]..Item[x-1])
    for (y=x; y<items; y++)
    {
        A2= sum(Item[x]..Item[y-1])
        calc f, if new minimum found -keep x,y
    }
}
Lior Kogan
  • 19,919
  • 6
  • 53
  • 85
  • Well, this is simple. And I see how this could be adapted to another "cost function", similar to the Knuth's algorithm. Not efficient, but improvements can be made. On the other hand - I'll rarely (if ever) get over 20 groups anyway, so maybe this is even the best in terms of maintainability. – Vilx- Oct 13 '11 at 09:17
  • above algo is actually[brute force algo] O(n^3), n^2 for two loops and n for summation in inner loop. – vikas368 Oct 16 '11 at 14:42
  • @vikas368: Actually not. You just need to add a single item in each iteration. I wrote it this way only for clarity. – Lior Kogan Oct 16 '11 at 16:16
  • okay. if u do the iterative sum then its O(n^2),agreed. thanks for clarifying – vikas368 Oct 16 '11 at 17:48
4

find sum and cumulative sum of series.

get a= sum/3

then locate nearest a, 2*a in the cumulative sum which divides your list into three equal parts.

vikas368
  • 1,408
  • 2
  • 10
  • 13
4

Lets say p is your array of paragraph heights;

int len= p.sum()/3;   //it is avarage value
int currlen=0;
int templen=0;
int indexes[2]; 
int j = 0;
for (i=0;i<p.lenght;i++)
{
    currlen = currlen + p[i];
    if (currlen>len)
    {
        if ((currlen-len)<(abs((currlen-p[i])-len))
        { //check which one is closer to avarege val
            indexes[j++] = i;
            len=(p.sum()-currlen)/2         //optional: count new avearege height from remaining lengths
            currlen = 0;
        }
        else
        {
            indexes[j++] = i-1;
            len=(p.sum()-currlen)/2
            currlen = p[i];
        }
    }
    if (j>2)
        break;
}

You will get starting index of 2nd and 3rd sequence. Note its kind of pseudo code :)

Vilx-
  • 104,512
  • 87
  • 279
  • 422
Zaphood
  • 2,509
  • 2
  • 22
  • 21
3

I believe that this can be solved with a dynamic programming algorithm for line breaking invented by Donald Knuth for use in TeX.

Aasmund Eldhuset
  • 37,289
  • 4
  • 68
  • 81
  • 1
    Interesting, but that algorithm relies on a known maximum line size. My columns don't have a limit - they just need to be as close to each other as possible, to give an aesthetically pleasing result. – Vilx- Oct 13 '11 at 09:05
  • I think that algorithm is for breaking a sequence of numbers into any number of segments, each of whose sum is at most some given k and as similar in size to each other as possible. What we want here is to break the sequence into a fixed number of segments (3) that are as similar in size to each other as possible, which is slightly different. But it could still be useful to try setting k = sum/3 or thereabouts. – j_random_hacker Oct 13 '11 at 09:09
2

Following Aasmund Eldhuset answer, I previously answerd this question on SO.

Word wrap to X lines instead of maximum width (Least raggedness)

This algo doesn't rely on the max line size but just gives an optimal cut.

I modified it to work with your problem :

L=[1,5,7,13,3,3,4,1,8,6,6,6]

def minragged(words, n=3):


P=2
cumwordwidth = [0]
# cumwordwidth[-1] is the last element
for word in words:
    cumwordwidth.append(cumwordwidth[-1] + word)
totalwidth = cumwordwidth[-1] + len(words) - 1  # len(words) - 1 spaces
linewidth = float(totalwidth - (n - 1)) / float(n)  # n - 1 line breaks

print "number of words:", len(words)
def cost(i, j):
    """
    cost of a line words[i], ..., words[j - 1] (words[i:j])
    """
    actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i])
    return (linewidth - float(actuallinewidth)) ** P

"""
printing the reasoning and reversing the return list
"""
F={} # Total cost function

for stage in range(n):
    print "------------------------------------"
    print "stage :",stage
    print "------------------------------------"
    print "word i to j in line",stage,"\t\tTotalCost (f(j))"
    print "------------------------------------"


    if stage==0:
        F[stage]=[]
        i=0
        for j in range(i,len(words)+1):
            print "i=",i,"j=",j,"\t\t\t",cost(i,j)
            F[stage].append([cost(i,j),0])
    elif stage==(n-1):
        F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
        for i in range(len(words)+1):
                j=len(words)
                if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]: #calculating min cost (cf f formula)
                    F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
                    F[stage][j][1]=i
                    print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]            
    else:
        F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
        for i in range(len(words)+1):
            for j in range(i,len(words)+1):
                if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]:
                    F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
                    F[stage][j][1]=i
                    print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]

print 'reversing list'
print "------------------------------------"
listWords=[]
a=len(words)
for k in xrange(n-1,0,-1):#reverse loop from n-1 to 1
    listWords.append(words[F[k][a][1]:a])
    a=F[k][a][1]
listWords.append(words[0:a])
listWords.reverse()

for line in listWords:
    print line, '\t\t',sum(line)

return listWords

THe result I get is :

[1, 5, 7, 13]       26
[3, 3, 4, 1, 8]         19
[6, 6, 6]       18
[[1, 5, 7, 13], [3, 3, 4, 1, 8], [6, 6, 6]]

Hope it helps

Community
  • 1
  • 1
Ricky Bobby
  • 7,490
  • 7
  • 46
  • 63
  • Uff, python. Not one of the languages I'm very familiar with. Will take a while to gnaw through. I'm tempted to start with Lior Kogan's solution, throw in a different cost function and a couple of optimizations to reduce the loop count. Since my series will usually be short (20 items is a big one), a quadratic algorithm isn't all that bad even. But in the mean time - have an upvote! :) – Vilx- Oct 13 '11 at 09:36
  • @Vilx- I tried to write an algo that follows step by step the dynamic program for least raggedness, so it shouldn't be very difficult to understand. But you can find a lot of versions (especially one in C#) of this code in the link I posted on the top of my answer,. – Ricky Bobby Oct 13 '11 at 09:41
  • Thank you. C# is my thing indeed. :) – Vilx- Oct 13 '11 at 10:19
  • Heh. Delightful. As I'm writing my own solution based on Lior Kogan's, but with a few optimizations, I'm slowly arriving at and understanding the solution proposed by you. :) – Vilx- Oct 13 '11 at 12:22