2

So, here's the plan: I'm using TCPDF to generate PDF documents containing a table. I'm generating an html table in PHP which I pass to TCPDF. However, TCPDF makes each column's width equal, which is a problem, as the content length in each column is quite different. The solution is to set the width attribute on the table's <td>s. But I can't quite workout the perfect way to do so. That's what I'm currently doing:

  1. I generate an array called $maxColumnSizes in which I store the maximum number of letters per column.
  2. I generate an array called $averageSizes in which I store the average number of letters per column.

So, below you see an example calculation. Column 0 has 8 letters average, and 26 letters at max, column 4 has 10 letters average, and 209 letters at max: enter image description here

So, here's the problem: I can't think of the "right" way to combine this information to get the "perfect" column widths. If I ignore the $maxColumnSizes array and set the column widths based on the $averageSizes, the table looks quite good. Except for the one row where Column 4 has 209 characters. As Column 4 is pretty small, the row where there are 209 characters has an insane height, to fit the 209 characters in.

To sum it up: How do I calculate the "perfect" table column width (given the table data)?

Notes:

  • "perfect width" for me means that the whole table's height is as small as possible.
  • I currently do not take letter-widths into account (I do not differentiate between the width of an i and a w)
  • As I have access to all the data, I can also make any other calculations needed. The two arrays I mention above I only used in my first tries.

EDIT

Based on the comment I added another calculation calculating $maxColumnSize / $averageColumnSize: enter image description here

Christian
  • 1,663
  • 19
  • 33
  • Quick thought: You may get the best results by determining the ratio of total characters per row divided by the characters per cell in that row. And re-running your average-sizes based on that. Because a high number of characters in a single cell is only bad if there are very little characters in other cells of the same row. – ontrack Jun 24 '14 at 19:36

2 Answers2

1

This is rather subjective, but to take a stab at an algorithm:

// Following two functions taken from this answer:
// http://stackoverflow.com/a/5434698/697370

// Function to calculate square of value - mean
function sd_square($x, $mean) { return pow($x - $mean,2); }

// Function to calculate standard deviation (uses sd_square)    
function sd($array) {
    // square root of sum of squares devided by N-1
    return sqrt(array_sum(array_map("sd_square", $array, array_fill(0,count($array), (array_sum($array) / count($array)) ) ) ) / (count($array)-1) );
}

// For any column...
$colMaxSize = /** from your table **/;
$colAvgSize = /** from your table **/;

$stdDeviation = sd(/** array of lengths for your column**/);
$coefficientVariation = $stdDeviation / $colAvgSize;

if($coefficientVariation > 0.5 && $coefficientVariation < 1.5) {
    // The average width of the column is close to the standard deviation
    // In this case I would just make the width of the column equal to the 
    // average.
} else {
    // There is a large variance in your dataset (really small values and 
    // really large values in the same set).
    // What to do here? I would base the width off of the max size, perhaps 
    // using (int)($colMaxSize / 2) or (int)($colMaxSize / 3) to fix long entries 
    // to a given number of lines.
}

There's a PECL extension that gives you the stats_standard_deviation function, but it is not bundled with PHP by default. You can also play around with the 0.5 and 1.5 values above until you get something that looks 'just right'.

Jeff Lambert
  • 24,395
  • 4
  • 69
  • 96
  • Thanks for your answer! I can't use any non-standard extensions. I just updated my question with your code. However I still have problems with the `else` part. `$colMaxSize / 2` or similar does not work, as it doesn't take into account how much it differs from the average. This would make the fourth column in my example 100 width, which is way to much. I've currently done something different, but I don't know if that's a good way ;-) – Christian Jun 24 '14 at 20:42
  • when there is a high variance, there's definitely a tradeoff between accommodating either the higher or lower values, but only you can decide what is 'correct' for your purposes. best of luck! – Jeff Lambert Jun 24 '14 at 21:18
1

Based on @watcher's answer, I came up with the following code. It works great in my test cases. I also made a GitHub repository with my code, as it is far better readable than here on StackOverflow.

<?php
/**
 * A simple class to auto-calculate the "perfect" column widths of a table.
 * Copyright (C) 2014 Christian Flach
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
 *
 * This is based on my question at StackOverflow:
 * http://stackoverflow.com/questions/24394787/how-to-calculate-the-perfect-column-widths
 *
 * Thank you "watcher" (http://stackoverflow.com/users/697370/watcher) for the initial idea!
 */

namespace Cmfcmf;

class ColumnWidthCalculator
{
    /**
     * @var array
     */
    private $rows;

    /**
     * @var bool
     */
    private $html;

    /**
     * @var bool
     */
    private $stripTags;

    /**
     * @var int
     */
    private $minPercentage;

    /**
     * @var Callable|null
     */
    private $customColumnFunction;

    /**
     * @param array $rows An array of rows, where each row is an array of cells containing the cell content.
     * @param bool  $html Whether or not the rows contain html content. This will call html_entity_decode.
     * @param bool  $stripTags Whether or not to strip tags (only if $html is true).
     * @param int   $minPercentage The minimum percentage each row must be wide.
     * @param null  $customColumnFunction A custom function to transform a cell's value before it's length is measured.
     */
    public function __construct(array $rows, $html = false, $stripTags = false, $minPercentage = 3, $customColumnFunction = null)
    {
        $this->rows = $rows;
        $this->html = $html;
        $this->stripTags = $stripTags;
        $this->minPercentage = $minPercentage;
        $this->customColumnFunction = $customColumnFunction;
    }

    /**
     * Calculate the column widths.
     *
     * @return array
     *
     * Explanation of return array:
     * - $columnSizes[$colNumber]['percentage'] The calculated column width in percents.
     * - $columnSizes[$colNumber]['calc'] The calculated column width in letters.
     *
     * - $columnSizes[$colNumber]['max'] The maximum column width in letters.
     * - $columnSizes[$colNumber]['avg'] The average column width in letters.
     * - $columnSizes[$colNumber]['raw'] An array of all the column widths of this column in letters.
     * - $columnSizes[$colNumber]['stdd'] The calculated standard deviation in letters.
     *
     * INTERNAL
     * - $columnSizes[$colNumber]['cv'] The calculated standard deviation / the average column width in letters.
     * - $columnSizes[$colNumber]['stdd/max'] The calculated standard deviation / the maximum column width in letters.
     */
    public function calculateWidths()
    {
        $columnSizes = array();

        foreach ($this->rows as $row) {
            foreach ($row as $key => $column) {
                if (isset($this->customColumnFunction)) {
                    $column = call_user_func_array($this->customColumnFunction, array($column));
                }
                $length = $this->strWidth($this->html ? html_entity_decode($this->stripTags ? strip_tags($column) : $column) : $column);

                $columnSizes[$key]['max'] = !isset($columnSizes[$key]['max']) ? $length : ($columnSizes[$key]['max'] < $length ? $length : $columnSizes[$key]['max']);

                // Sum up the lengths in `avg` for now. See below where it is converted to the actual average.
                $columnSizes[$key]['avg'] = !isset($columnSizes[$key]['avg']) ? $length : $columnSizes[$key]['avg'] + $length;
                $columnSizes[$key]['raw'][] = $length;
            }
        }

        // Calculate the actual averages.
        $columnSizes = array_map(function ($columnSize) {
            $columnSize['avg'] = $columnSize['avg'] / count ($columnSize['raw']);

            return $columnSize;
        }, $columnSizes);

        foreach ($columnSizes as $key => $columnSize) {
            $colMaxSize = $columnSize['max'];
            $colAvgSize = $columnSize['avg'];

            $stdDeviation = $this->sd($columnSize['raw']);
            $coefficientVariation = $stdDeviation / $colAvgSize;

            $columnSizes[$key]['cv'] = $coefficientVariation;
            $columnSizes[$key]['stdd'] = $stdDeviation;
            $columnSizes[$key]['stdd/max'] = $stdDeviation / $colMaxSize;

            // $columnSizes[$key]['stdd/max'] < 0.3 is here for no mathematical reason, it's been found by trying stuff
            if(($columnSizes[$key]['stdd/max'] < 0.3 || $coefficientVariation == 1) && ($coefficientVariation == 0 || ($coefficientVariation > 0.6 && $coefficientVariation < 1.5))) {
                // The average width of the column is close to the standard deviation
                // In this case I would just make the width of the column equal to the
                // average.
                $columnSizes[$key]['calc'] = $colAvgSize;
            } else {
                // There is a large variance in the dataset (really small values and
                // really large values in the same set).
                // Do some magic! (There is no mathematical rule behind that line, it's been created by trying different combinations.)
                if ($coefficientVariation > 1 && $columnSizes[$key]['stdd'] > 4.5 && $columnSizes[$key]['stdd/max'] > 0.2) {
                    $tmp = ($colMaxSize - $colAvgSize) / 2;
                } else {
                    $tmp = 0;
                }

                $columnSizes[$key]['calc'] = $colAvgSize + ($colMaxSize / $colAvgSize) * 2 / abs(1 - $coefficientVariation);
                $columnSizes[$key]['calc'] = $columnSizes[$key]['calc'] > $colMaxSize ? $colMaxSize - $tmp : $columnSizes[$key]['calc'];
            }
        }

        $totalCalculatedSize = 0;
        foreach ($columnSizes as $columnSize) {
            $totalCalculatedSize += $columnSize['calc'];
        }

        // Convert calculated sizes to percentages.
        foreach ($columnSizes as $key => $columnSize) {
            $columnSizes[$key]['percentage'] = 100 / ($totalCalculatedSize / $columnSize['calc']);
        }

        // Make sure everything is at least 3 percent wide.
        if ($this->minPercentage > 0) {
            foreach ($columnSizes as $key => $columnSize) {
                if ($columnSize['percentage'] < $this->minPercentage) {
                    // That's how many percent we need to steal.
                    $neededPercents = ($this->minPercentage - $columnSize['percentage']);

                    // Steal some percents from the column with the $coefficientVariation nearest to one and being big enough.
                    $lowestDistance = 9999999;
                    $stealKey = null;
                    foreach ($columnSizes as $k => $val) {
                        // This is the distance from the actual $coefficientVariation to 1.
                        $distance = abs(1 - $val['cv']);
                        if ($distance < $lowestDistance
                            && $val['calc'] - $neededPercents > $val['avg'] /* This line is here due to whatever reason :/ */
                            && $val['percentage'] - $this->minPercentage >= $neededPercents /* Make sure the column we steal from would still be wider than $this->minPercentage percent after stealing. */
                        ) {
                            $stealKey = $k;
                            $lowestDistance = $distance;
                        }
                    }
                    if (!isset($stealKey)) {
                        // Dang it! We could not get something reliable here. Fallback to stealing from the largest column.
                        $max = -1;
                        foreach ($columnSizes as $k => $val) {
                            if ($val['percentage'] > $max) {
                                $stealKey = $k;
                                $max = $val['percentage'];
                            }
                        }
                    }
                    $columnSizes[$stealKey]['percentage'] = $columnSizes[$stealKey]['percentage'] - $neededPercents;

                    $columnSizes[$key]['percentage'] = $this->minPercentage;
                }
            }
        }

        return $columnSizes;
    }

    /**
     * Function to calculate standard deviation.
     * http://stackoverflow.com/a/5434698/697370
     *
     * @param $array
     *
     * @return float
     */
    protected function sd($array)
    {
        if (count($array) == 1) {
            // Return 1 if we only have one value.
            return 1.0;
        }
        // Function to calculate square of value - mean
        $sd_square = function ($x, $mean) { return pow($x - $mean,2); };

        // square root of sum of squares devided by N-1
        return sqrt(array_sum(array_map($sd_square, $array, array_fill(0,count($array), (array_sum($array) / count($array)) ) ) ) / (count($array)-1) );
    }


    /**
     * Helper function to get the (approximate) width of a string. A normal character counts as 1, short characters
     * count as 0.4 and long characters count as 1.3.
     * The minimum width returned is 1.
     *
     * @param $text
     *
     * @return float
     */
    protected function strWidth($text)
    {
        $smallCharacters = array('!', 'i', 'f', 'j', 'l', ',', ';', '.', ':', '-', '|',
            ' ', /* normal whitespace */
            "\xC2", /* non breaking whitespace */
            "\xA0", /* non breaking whitespace */
            "\n",
            "\r",
            "\t",
            "\0",
            "\x0B" /* vertical tab */
        );
        $bigCharacters = array('w', 'm', '—', 'G', 'ß', '@');

        $width = strlen($text);
        foreach (count_chars($text, 1) as $i => $val) {
            if (in_array(chr($i), $smallCharacters)) {
                $width -= (0.6 * $val);
            }
            if (in_array(chr($i), $bigCharacters)) {
                $width += (0.3 * $val);
            }
        }
        if ($width < 1) {
            $width = 1;
        }

        return (float)$width;
    }
}

That's it! $columnSizes[$colNumber]['percentage'] now includes a well fitting ("perfect") width for each column.

Christian
  • 1,663
  • 19
  • 33
  • 2
    I was looking for something similar in my javascript project and decided to port the logic. Works well! In case anyone is interested: https://www.npmjs.com/package/column-widths – Ruben Stolk Feb 22 '17 at 22:29