2

I used to run a phpBB forum for our class in school but we have now graduated and the forum isn't used anymore. I want to remove the phpBB installation but there is a lot written in the forum that is fun to read now and then.

I wonder if there is an easy way to convert the phpBB forum to some kind of static archive page that anyone can browse and read, instead of having the full phpBB installation.

I guess I could create some kind of converter myself using the database tables but I wonder if there already is something like that.

Zeta Two
  • 1,776
  • 1
  • 18
  • 37
  • Hi Zeta Two, could you share what did you do in the end? Did you create your static forum archive? – automaciej May 04 '16 at 22:23
  • @automatthias If I remember correctly, I think I wrote a small script that converted the content into just two tables: topics and posts and then printed them kind of like how Sephrial suggested. Unfortunately, when I looked at this last time, the script didn't work with newer versions of phpBB. – Zeta Two May 09 '16 at 11:34

3 Answers3

4

I just used wget to archive a PhpBB2 forum completely. Things might be a bit different for PhpBB3 or newer version, but the basic approach is probably useful.

I first populated a file with session cookies (to prevent phpbb from putting sid= in links), then did the actual mirror. This used wget 1.20, since 1.18 messed up the --adjust-extension for non-html files (e.g. gifs).

wget https://example.com/forum/  --save-cookies cookies \
    --keep-session-cookies
wget https://example.com/forum/  --load-cookies cookies \
     --page-requisites --convert-links  --mirror --no-parent --reject-regex \
     '([&?]highlight=|[&?]order=|posting.php[?]|privmsg.php[?]|search.php[?]|[&?]mark=|[&?]view=|viewtopic.php[?]p=)' \
     --rejected-log=rejected.log -o wget.log --server-response \
     --adjust-extension --restrict-file-names=windows

This tells wget to recursively mirror the entire site, including requisites (CSS and images). It rejects (skips) certain urls, mostly because they are no longer useful in a static site (e.g. search) or are just slightly different or even identical views on the same content (e.g. viewtopic.php?p=... just returns the topic containing the given post, so no need to mirror that topic for each individual post. The --adjust-extension option makes wget add .html to dynamically generated HTML pages, and --restrict-file-names=windows makes it replace (among other things) the ? with a @, so you can actually put the result on a webserver without that webserver chopping the urls at the ? (which normally starts the query parameters).

Matthijs Kooijman
  • 2,498
  • 23
  • 30
  • Works like a charm on an old version of PhpBB3! Keep in mind that you might need to add `--wait=5 --random-wait` in order to not overload the server. Keep in mind that this approach does not archive hotlinked images from external websites. – joppiesaus Oct 15 '19 at 12:11
  • I used this without the `--adjust-extension` and `--restrict-file-names` options to preserve incoming links. I used the following nginx directive to be able to serve the resulting files as HTML files including the query arguments: `location ~ ^/forum/ { types { } default_type "text/html"; try_files $uri?$args =404; }` – pfrenssen Mar 01 '21 at 06:54
  • If you have a large version like me, may also want to add --show-progress to show some kind of progress – thetwopct Sep 20 '22 at 06:12
1

You could write a quick php script, to query the database and generate a flat HTML file.

...
<body>
    <table>
        <tr>
            <th>Topic</th>
            <th>Author</th>
            <th>Content</th>
        </tr>

        // Query php Database Table
        foreach (Row in tblComment) {
            echo " 
            <tr>
                <th>$topic</th>
                <th>$author</th>
                <th>$content</th>
            </tr>
            "
        }

    </table>
</body>
...

Or you could get a little fancier and generate a HTML file for each subject, and build a index.html page that has links to all the HTML pages created, but I don't think you'll find anything that does what you need.

Sephrial
  • 1,207
  • 1
  • 13
  • 26
  • Yeah, that will probably be rather easy anyway. If anyone want the code I can link to it here later. – Zeta Two Oct 20 '10 at 18:17
0

Another option would be to use a website copier such as http://www.httrack.com/ to generate and save all generated HTML files that can later be served from the server.

Collector
  • 2,034
  • 4
  • 22
  • 39