Creating a table of contents in php

Question

I am looking to create a very simple, very basic nested table of contents in php which gets all the h1-6 and indents things appropriately. This means that if I have something like:

<h1>content</h1>
<h2>more content</h2>

I should get:

content
    more content.

I know it will be css that creates the indents, that's fine, but how do I create a table of contents with working links to the content on the page?

apparently its hard to grasp what I am asking for...

I am asking for a function that reads an html document and pulls out all the h1-6 and makes a table of contents.

You're going to have to specify your question more; as it stands it doesn't have much to do with php or programming, just formatting. For instance, what form does the corpus take that requires a TOC? How do you plan to parse it out into pages? How are the pages arranged such that you can link to them? These are necessary for figuring out how to make a TOC creator functional. — Nathaniel Ford, Mar 26 '13 at 02:31
Simple answer: create a function that can process the DOM as a tree (starting with the body node) and build an array of H1 nodes, each containing an array of H2 nodes and so on. Simple answer, not a simple solution. You will then, of course have to worry about its display. — J.D. Pace, Mar 26 '13 at 02:50
@BlackHatShadow: How can you perform that type of extraction with CSS? — J.D. Pace, Mar 26 '13 at 02:51
`"I know it will be css that creates the indents..."` the working links can be under tag with same property of the `
` and `
` tag. or you can use `
-
` tags. — Mark, Mar 26 '13 at 02:59
It's not clear what you are asking. Do you just want to link to an anchor on the same page? `
Title
` then in the TOC `link`. — Octopus, Mar 26 '13 at 06:47
i don't understand what is the point of creating in `PHP`. You can check jquery plugins for easy TOC http://www.jquery4u.com/plugins/5-jquery-table-content-toc-plugins/ — Muhammad Haseeb Khan, Mar 26 '13 at 18:40

score 6 · Answer 1 · answered Oct 10 '21 at 11:09

I used this package, it's pretty easy and straight forward to use.

https://github.com/caseyamcl/toc

Install via Composer by including the following in your composer.json file:

{
    "require": {
        "caseyamcl/toc": "^3.0",
    }
}

Or, drop the src folder into your application and use a PSR-4 autoloader to include the files.

Usage This package contains two main classes:

TOC\MarkupFixer: Adds id anchor attributes to any H1...H6 tags that do not already have any (you can specify which header tag levels to use at runtime) TOC\TocGenerator: Generates a Table of Contents from HTML markup Basic Example:

$myHtmlContent = <<<END
    <h1>This is a header tag with no anchor id</h1>
    <p>Lorum ipsum doler sit amet</p>
    <h2 id='foo'>This is a header tag with an anchor id</h2>
    <p>Stuff here</p>
    <h3 id='bar'>This is a header tag with an anchor id</h3>
END;

$markupFixer  = new TOC\MarkupFixer();
$tocGenerator = new TOC\TocGenerator();

// This ensures that all header tags have `id` attributes so they can be used as anchor links
$htmlOut  = "<div class='content'>" . $markupFixer->fix($myHtmlContent) . "</div>";

//This generates the Table of Contents in HTML
$htmlOut .= "<div class='toc'>" . $tocGenerator->getHtmlMenu($myHtmlContent) . "</div>";

 echo $htmlOut;

This produces the following output:

<div class='content'>
    <h1 id="this-is-a-header-tag-with-no-anchor-id">This is a header tag with no anchor id</h1>
    <p>Lorum ipsum doler sit amet</p>
    <h2 id="foo">This is a header tag with an anchor id</h2>
    <p>Stuff here</p>
    <h3 id="bar">This is a header tag with an anchor id</h3>
</div>
<div class='toc'>
    <ul>
        <li class="first last">
        <span></span>
            <ul class="menu_level_1">
                <li class="first last">
                    <a href="#foo">This is a header tag with an anchor id</a>
                    <ul class="menu_level_2">
                        <li class="first last">
                            <a href="#bar">This is a header tag with an anchor id</a>
                        </li>
                    </ul>
                </li>
            </ul>
        </li>
    </ul>
</div>

AbcAeffchen · Accepted Answer · 2019-04-01T07:53:26.750

For this you have just to search for the tags in the HTML code.

I wrote two functions (PHP 5.4.x).

The first one returns an array, that contains the data of the table of contents. The data is is only the headline it self, the id of the tag (if you want to use anchors) and a sub-table of content.

function get_headlines($html, $depth = 1)
{
    if($depth > 7)
        return [];

    $headlines = explode('<h' . $depth, $html);

    unset($headlines[0]);       // contains only text before the first headline

    if(count($headlines) == 0)
        return [];

    $toc = [];      // will contain the (sub-) toc

    foreach($headlines as $headline)
    {
        list($hl_info, $temp) = explode('>', $headline, 2);
        // $hl_info contains attributes of <hi ... > like the id.
        list($hl_text, $sub_content) = explode('</h' . $depth . '>', $temp, 2);
        // $hl contains the headline
        // $sub_content contains maybe other <hi>-tags
        $id = '';
        if(strlen($hl_info) > 0 && ($id_tag_pos = stripos($hl_info,'id')) !== false)
        {
            $id_start_pos = stripos($hl_info, '"', $id_tag_pos);
            $id_end_pos = stripos($hl_info, '"', $id_start_pos);
            $id = substr($hl_info, $id_start_pos, $id_end_pos-$id_start_pos);
        }

        $toc[] = [  'id' => $id,
                    'text' => $hl_text,
                    'sub_toc' => get_headlines($sub_content, $depth + 1)
                ];

    }

    return $toc;
}

The second returns a string that formats the toc with HTML.

function print_toc($toc, $link_to_htmlpage = '', $depth = 1)
{
    if(count($toc) == 0)
        return '';

    $toc_str = '';

    if($depth == 1)
        $toc_str .= '<h1>Table of Content</h1>';

    foreach($toc as $headline)
    {
        $toc_str .= '<p class="headline' . $depth . '">';
        if($headline['id'] != '')
            $toc_str .= '<a href="' . $link_to_htmlpage . '#' . $headline['id'] . '">';

        $toc_str .= $headline['text'];
        $toc_str .= ($headline['id'] != '') ? '</a>' : '';
        $toc_str .= '</p>';

        $toc_str .= print_toc($headline['sub_toc'], $link_to_htmlpage, $depth+1);
    }

    return $toc_str;
}

Both functions are far away from being perfect, but they work fine in my tests. Feel free to improve them.

Notice: get_headlines is not a parser, so it does not work on broken HTML code and just crashes. It also only works with lowercase <hi>-tags.

The notice that this isn't (using) a real parser is important. It may work on various nicely formed HTML, but constructing edge cases that break its assumptions is very easy, so I would not recommend using this function. I wrote a [similar warning here.](https://alanhogan.com/html-myths#regex-html) — Alan H., Apr 27 '22 at 07:03

score -1 · Answer 3 · answered Nov 11 '22 at 19:18

How about this (although it can only do one H level) ...

function getTOC(string $html, int $level=1) {
    $toc="";
    $x=0;
    $n=0;
    $html1="";

    $safety=1000;
    while ( $x>-1 and $safety-->0 ) {

        $html0=strtolower($html);
        $x=strpos($html0, "<h$level");

        if ( $x>-1 ) {
            $y=strpos($html0, "</h$level>");
            $part=strip_tags(substr($html, $x, $y-$x));
        
            $toc  .="<a href='#head$n'>$part</a>\n";
            $html1.=substr($html,0,$x)."<a name='head$n'></a>".substr($html, $x, $y-$x+5)."\n";
            $html=substr($html, $y+5);
            $n++;
        }

    }
    $html1.=$html;
    $html=$toc."\n<HR>\n".$html1;
    return $html;
}

This will create a basic list of links

$html="<html><body>";
$html.="<h1>Heading 1a</h1>One Two Three";
$html.="<h2>heading 2a</h2>Four Five Six";
$html.="<h1 class='something'>Heading 1b</h1>Seven Eight Nine";
$html.="<h2>heading 2b</h2>Ten Eleven Twelve";
$html.="</body></html>";


echo getTOC($html, 1);

gives...

<a href='#head0'>Heading 1a</a>
<a href='#head1'>Heading 1b</a>

<HR>
<html><body><a name='head0'></a><h1>Heading 1a</h1>
One Two Three<h2>heading 2a</h2>Four Five Six<a name='head1'></a><h1 
class='something'>Heading 1b</h1>
Seven Eight Nine<h2>heading 2b</h2>Ten Eleven Twelve</body></html>

See https://onlinephp.io/c/fceb0 for a running example

Using string pattern matching is [absolutely not](https://alanhogan.com/html-myths#regex-html) a robust way to handle HTML input! Please do not use this code — Alan H., Jan 07 '23 at 03:50

score -2 · Answer 4 · answered Nov 25 '19 at 11:20

This function return the string with appended table of content only for h2 tags. 100% tested code.

function toc($str){

        $html = preg_replace('/]+\>/i', '$0 In This Article', $str, 1); //toc just after first image in content

        $doc = new DOMDocument();
        $doc->loadHTML($html);

        // create document fragment
        $frag = $doc->createDocumentFragment();
        // create initial list
        $frag->appendChild($doc->createElement('ul'));
        $head = &$frag->firstChild;
        $xpath = new DOMXPath($doc);
        $last = 1;

        // get all H1, H2, …, H6 elements
        $tagChek = array();
        foreach ($xpath->query('//*[self::h2]') as $headline) {
            // get level of current headline
            sscanf($headline->tagName, 'h%u', $curr);
            array_push($tagChek,$headline->tagName);

            // move head reference if necessary
            if ($curr parentNode->parentNode;
                }
            } elseif ($curr > $last && $head->lastChild) {
                // move downwards and create new lists
                for ($i=$last; $ilastChild->appendChild($doc->createElement('ul'));
                    $head = &$head->lastChild->lastChild;
                }
            }
            $last = $curr;

            // add list item
            $li = $doc->createElement('li');
            $head->appendChild($li);
            $a = $doc->createElement('a', $headline->textContent);
            $head->lastChild->appendChild($a);

            // build ID
            $levels = array();
            $tmp = &$head;
            // walk subtree up to fragment root node of this subtree
            while (!is_null($tmp) && $tmp != $frag) {
                $levels[] = $tmp->childNodes->length;
                $tmp = &$tmp->parentNode->parentNode;
            }
            $id = 'sect'.implode('.', array_reverse($levels));
            // set destination
            $a->setAttribute('href', '#'.$id);
            // add anchor to headline
            $a = $doc->createElement('a');
            $a->setAttribute('name', $id);
            $a->setAttribute('id', $id);
            $headline->insertBefore($a, $headline->firstChild);
        }
       // echo $frag;
        // append fragment to document
        if(!empty($tagChek)):
            $doc->getElementsByTagName('section')->item(0)->appendChild($frag);
            return $doc->saveHTML();
        else:
            return $str;    
        endif;

    }

Using an actual HTML parser here is very good! Using a [regex](https://alanhogan.com/html-myths#regex-html) to find a character sequence that may or may not be the end of an image tag is not! — Alan H., Apr 27 '22 at 07:07

Creating a table of contents in php

content

` and `

` tag. or you can use `
-
` tags.

Title

4 Answers4

Linked

Creating a table of contents in php

content

` and `

` tag. or you can use `-` tags.

Title

4 Answers4

Linked

` tag. or you can use `
-
` tags.