Is there a way to convert Trac Wiki pages to HTML?

Question

I see the suggestion of using Mylyn WikiText to convert wiki pages to html from this question except I'm not sure if its what I'm looking for from reading the front page of the site alone. I'll look into it further. Though I would prefer it being a Trac plug-in so I could initiate the conversion from within the wiki options but all the plugins at Trac-Hacks export single pages only whereas I want to dump all formatted pages in one go.

So is there an existing Trac plug-in or stand-alone application that'll meet my requirements? If not where would you point me to start looking at implementing that functionality myself?

score 5 · Accepted Answer · answered Dec 14 '10 at 20:52

You may find some useful information in the comments for this ticket on trac-hacks. One user reports using the wget utility to create a mirror copy of the wiki as if it was a normal website. Another user reports using the XmlRpc plugin to extract HTML versions of any given wiki page, but this method would probably require you to create a script to interface with the plugin. The poster didn't provide any example code, unfortunately, but the XmlRpc Plugin page includes a decent amount of documentation and samples to get you started.

If you have access to a command line on the server hosting Trac, you can use the trac-admin command like:

trac-admin /path/to/trac wiki export <wiki page name>

to retrieve a plain-text version of the specified wiki page. You would then have to parse the wiki syntax to HTML, but there are tools available to do that.

+1 for wget which is just enough for my needs at the moment. I'll come back to this answer and try out the other suggestions if I need ideas for anything more sophisticated. Thanks. — spade78, Dec 14 '10 at 22:06

score 1 · Answer 2 · answered Jul 24 '17 at 17:33

For our purposes, we wanted to export each of the wiki pages individually without the header/footer and other instance-specific content. For this purpose, the XML-RPC interface was a good fit. Here's the Python 3.6+ script I created for exporting the whole of the wiki into HTML files in the current directory. Note that this technique doesn't rewrite any hyperlinks, so they will resolve absolutely to the site.

import os
import xmlrpc.client
import getpass
import urllib.parse


def add_auth(url):
    host = urllib.parse.urlparse(url).netloc
    realm = os.environ.get('TRAC_REALM', host)
    username = getpass.getuser()
    try:
        import keyring
        password = keyring.get_password(realm, username)
    except Exception:
        password = getpass.getpass(f"password for {username}@{realm}: ")

    if password:
        url = url.replace('://', f'://{username}:{password}@')

    return url


def main():
    trac_url = add_auth(os.environ['TRAC_URL'])
    rpc_url = urllib.parse.urljoin(trac_url, 'login/xmlrpc')
    trac = xmlrpc.client.ServerProxy(rpc_url)

    for page in trac.wiki.getAllPages():
        filename = f'{page}.html'.lstrip('/')
        dir = os.path.dirname(filename)
        dir and os.makedirs(dir, exist_ok=True)
        with open(filename, 'w') as f:
            doc = trac.wiki.getPageHTML(page)
            f.write(doc)


__name__ == '__main__' and main()

This script requires only Python 3.6, so download and save to a export-wiki.py file, then set the TRAC_URL environment variable and invoke the script. For example on Unix:

$ TRAC_URL=http://mytrac.mydomain.com python3.6 export-wiki.py

It will prompt for a password. If no password is required, just hit enter to bypass. If a different username is needed, also set the USER environment variable. Keyring support is also available but can be disregarded.

Is there a way to convert Trac Wiki pages to HTML?

2 Answers2