0

How do I export one or more Confluence spaces to PDF based on a search of all available spaces? Information is scarce, so I am making this a Q&A to help others.

I have read through a maze of API deprecations, replacements, and problem reports, and I understand that Confluence still does not allow PDF export through a modern RESTful API, only through its long-unsupported SOAP API. In 2023.

Some of the more useful content I have read includes:

https://jira.atlassian.com/browse/CONFSERVER-9901 https://community.atlassian.com/t5/Confluence-questions/RPC-Confluence-export-fails-with-TYPE-PDF/qaq-p/269310 https://developer.atlassian.com/server/confluence/remote-api-specification-for-pdf-export/

This following SO example is similar to what is needed, but it does not search spaces, which requires a different endpoint as of sometime before June 2015. Use of Ruby and PHP would also represent introduction of a new language on my team, and we prefer to stick with C#, Python, and in emergency conditions, Java. How to export a Confluence "Space" to PDF using remote API

Charles Burns
  • 10,310
  • 7
  • 64
  • 81

1 Answers1

1

The following Python script was tested using Python 3.11 with Confluence Server 7.19. It is written to be short, not perfect, so feel free to modify as needed.

Python 3 Code

# Saves one or more Confluence spaces to PDF files. On-prem installs only. SOAP API must be enabled/unblocked
# Be sure to: pip install zeep and change the URL and YOUR_KEY_FILTER_HERE below
# Charles Burns (https://stackoverflow.com/users/161816/charles-burns), February 2023

import shutil
import logging
from getpass import getpass
from datetime import datetime, timezone
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport

confluence = "on-prem-confluence.net" # Your company's Confluence URI
user = input("Confluence login name: ")
password = getpass()

print("Authorizing on " + confluence + "...")
session = Session()
session.auth = HTTPBasicAuth(user, password)
getSpacesClient = Client('https://' + confluence + '/rpc/soap-axis/confluenceservice-v2?WSDL', transport=Transport(session=session))
token = getSpacesClient.service.login(user, password)

print("Getting list of spaces to export...")
allSpaces = getSpacesClient.service.getSpaces(token)
spaces = list(filter(lambda s: s.key.startswith("YOUR_KEY_FILTER_HERE"), allSpaces))
print("Found {} spaces (filtered from {} total): {}".format(len(spaces), len(allSpaces), ", ".join([s.name for s in spaces])))
pdfExportClient = Client('https://' + confluence + '/rpc/soap-axis/pdfexport?WSDL', transport=Transport(session=session))

for space in spaces:
    print("Beginning export of '{}' from {}".format(space.name, space.url))
    try:
        url = siteExportUrl = pdfExportClient.service.exportSpace(token, space.key)
    except Exception as e:
        logging.exception("ERROR EXPORTING " + space.name)
        break
    print("    Downloading exported PDF from {}".format(url))
    fileName = "{}UTC_{}.pdf".format(datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S"), space.key)
    file = session.get(siteExportUrl, stream=True)
    with open(fileName, 'wb') as f:
        shutil.copyfileobj(file.raw, f)
    print("    Export complete: {}\n".format(fileName))

Example output

Confluence login name: charlesburns
Password: 
Authorizing on on-prem-confluence.net...
Getting list of spaces to export...
Found 31 spaces (filtered from 4601 total): Some Space, Some Other Space, Yet Another Space
Beginning export of 'Some Space' from https://on-prem-confluence.net/display/MYKEY
    Downloading exported PDF from https://on-prem-confluence.net/download/temp/pdfexport-20230224/MYKEY.pdf
    Export complete: 20230225-000215UTC_MYKEY.pdf

Beginning export of 'Some Space' from https://on-prem-confluence.net/display/MYKEY
    Downloading exported PDF from https://on-prem-confluence.net/download/temp/pdfexport-20230224/MYKEY.pdf
    Export complete: 20230225-000215UTC_MYKEY.pdf

On successful export, PDF files will be in the same folder as the script.

Errors encountered and possible causes

Error Note
ValueError: Invalid tag name 'Object[]' SOAP API may be disabled, ask admins
requests.exceptions.HTTPError: 401 Client Error Bad password or no access to export the space
requests.exceptions.ConnectTimeout Confluence instance down or URL incorrect
Charles Burns
  • 10,310
  • 7
  • 64
  • 81