For this you have just to search for the tags in the HTML code.
I wrote two functions (PHP 5.4.x).
The first one returns an array, that contains the data of the table of contents. The data is is only the headline it self, the id of the tag (if you want to use anchors) and a sub-table of content.
function get_headlines($html, $depth = 1)
{
if($depth > 7)
return [];
$headlines = explode('<h' . $depth, $html);
unset($headlines[0]); // contains only text before the first headline
if(count($headlines) == 0)
return [];
$toc = []; // will contain the (sub-) toc
foreach($headlines as $headline)
{
list($hl_info, $temp) = explode('>', $headline, 2);
// $hl_info contains attributes of <hi ... > like the id.
list($hl_text, $sub_content) = explode('</h' . $depth . '>', $temp, 2);
// $hl contains the headline
// $sub_content contains maybe other <hi>-tags
$id = '';
if(strlen($hl_info) > 0 && ($id_tag_pos = stripos($hl_info,'id')) !== false)
{
$id_start_pos = stripos($hl_info, '"', $id_tag_pos);
$id_end_pos = stripos($hl_info, '"', $id_start_pos);
$id = substr($hl_info, $id_start_pos, $id_end_pos-$id_start_pos);
}
$toc[] = [ 'id' => $id,
'text' => $hl_text,
'sub_toc' => get_headlines($sub_content, $depth + 1)
];
}
return $toc;
}
The second returns a string that formats the toc with HTML.
function print_toc($toc, $link_to_htmlpage = '', $depth = 1)
{
if(count($toc) == 0)
return '';
$toc_str = '';
if($depth == 1)
$toc_str .= '<h1>Table of Content</h1>';
foreach($toc as $headline)
{
$toc_str .= '<p class="headline' . $depth . '">';
if($headline['id'] != '')
$toc_str .= '<a href="' . $link_to_htmlpage . '#' . $headline['id'] . '">';
$toc_str .= $headline['text'];
$toc_str .= ($headline['id'] != '') ? '</a>' : '';
$toc_str .= '</p>';
$toc_str .= print_toc($headline['sub_toc'], $link_to_htmlpage, $depth+1);
}
return $toc_str;
}
Both functions are far away from being perfect, but they work fine in my tests. Feel free to improve them.
Notice: get_headlines
is not a parser, so it does not work on broken HTML code and just crashes. It also only works with lowercase <hi>
-tags.
content
` – Andreas Wong Mar 26 '13 at 02:30` and `
– Mark Mar 26 '13 at 02:59` tag. or you can use `
-- ` tags.
Title
` then in the TOC `link`. – Octopus Mar 26 '13 at 06:47