Script to generate Markdown files with embedded PlantUML diagrams for GitLab's PlantUML renderer

Question

I am setting up a repository to store software documentation consisting of several documents which are written in Markdown, and I want to be able to embed PlantUML diagrams in them. The repository is hosted in Gitlab, which includes a PlantUML renderer but does not allow preprocessing and therefore using the !include clause to reference diagrams in other files.

I would like to have a bash or python script that:

Searches all .md files and append their content one after the other in a new file "all-docs.md".
Searches in that file "all-docs.md" for all the !include [FILEPATH] clauses and replace the content which is between @startuml and @enduml from that file [FILEPATH] into "all-docs.md".

For example:

"all-docs.md" contains in certain part:

Here is the Profile class diagram:

``plantuml
@startuml
!include ./data-models/profile.puml
Profile o-- UidObject
@enduml
``

And profile.puml content is:

@startuml
class Profile <UidObject> {
    + string name
    + string email
    + string phone
    + Date birthDate
}
@enduml

The result after the script will be to have in "all-docs.md":

Here is the Profile class diagram:

``plantuml
@startuml
class Profile <UidObject> {
    + string name
    + string email
    + string phone
    + Date birthDate
}
Profile o-- UidObject
@enduml
``

The repo has the following structure.

/
├── assets/
├── docs/
├── uml/

The assets/ directory contains various assets such as images, icons, and other resources.
The docs/ directory contains the documents (markdown files)
The uml/ directory contains contains PlantUML source files that are used to generate diagrams for the software documentation.

Jetchisel · Answer 1 · 2023-05-14T11:55:29.073

A bash and find solution with your given input/files, something like:

#!/usr/bin/env bash

#: Find and concatenate all .md files from docs/ directory in all-docs.md file.
find docs/ -type f -name '*.md' -exec sh -c 'cat -- "$@" >> all-docs.md' sh {} +

#: Parse the all-docs.md file and 
#: create/print the desired result/output.
while IFS= read -ru "$fd" line; do
  if [[ $line == "!include"* ]]; then
    temp=${line#!include *}
    mapfile -t plum < "$temp" &&
    unset -v 'plum[-1]' &&
    printf '%s\n' "${plum[@]:1}"
  else
    printf '%s\n' "$line"
  fi
done {fd}< all-docs.md

Replace the last line of the code to

done {fd}< all-docs.md > tempfile && mv tempfile all-docs.md

If you're satisfied with the output and permanent changes needs to be made for all-docs.md.

Or just parse the output of find directly without creating the all-docs.md, something like:

#!/usr/bin/env bash

##: Find all the files ending in .md in the docs/ directory
##: conCATenate all the contents of the files in question.
find docs/ -type f -name '*.md' -exec sh -c 'cat -- "$@"' sh {} + | {
  while IFS= read -r line; do
    [[ $line != "!include"* ]] && { ##: If line does not have the pattern !include.
      printf '%s\n' "$line"         ##: Print the line as is.
    }
    [[ $line == "!include"* ]] && { ##: If line has the pattern !include parse it.
      temp=${line#!include *}       ##: Extract FILE_PATH in a variable named temp.
      if [[ -s "$temp" ]]; then     ##: If variable is an existing non-empty file.
        mapfile -u3 -t plum 3< "$temp" && ##: Extract the desired result.
        unset -v 'plum[-1]' &&
        printf '%s\n' "${plum[@]:1}"
      else
        printf '%s\n' "$line"  ##: Otherwise just print the line as is.
      fi
    }
  done
}

If creating the all-docs.md is a must/requirement, then change the last line to:

} > all-docs.md

Both mapfile aka readarray and {fd} requires bash v4+

pcba-dev · Accepted Answer · 2023-05-13T15:07:44.723

I was able to find a solution myself using Python.

Get a list of markdown files in the source "docs" directory and sort them based on their index prefix (files are named "0.introduction.md", "1.general.md", etc):
Define a new function called replace_include_with_content. This function takes a line as input and checks if it matches the pattern for the !include clause. If it does, it extracts the include path, constructs the relative path to the corresponding uml file, reads its content, and returns it. Otherwise, it returns the line as is.
Iterate through the sorted markdown files, read their contents. Within this loop, iterate over the lines of each file's content. For each line, we call the replace_include_with_content function to check if it needs to be replaced.

import os
import re

docs_dir = "docs"
uml_dir = "uml"
output_file = "all-docs.md"

# Get a sorted list of markdown files in the source directory
markdown_files = sorted(
    [file for file in os.listdir(docs_dir) if file.endswith(".md")],
    key=lambda x: int(x.split(".")[0])
)

# Replace function
def replace_include_with_content(line):
    # Check if the line matches the pattern for !include clause
    match = re.match(r"!include\s+(.+)", line)
    if match:
        include_path = match.group(1)
        uml_file_path = os.path.join(uml_dir, include_path)
        # Read the content of the referenced UML file
        with open(uml_file_path, "r") as uml_file:
            uml_content = uml_file.readlines()
            filtered_content = []
            include_next_line = False
            for uml_line in uml_content:
                if uml_line.strip() == "@startuml":
                    include_next_line = True
                elif uml_line.strip() == "@enduml":
                    include_next_line = False
                elif include_next_line:
                    filtered_content.append(uml_line)
            return "".join(filtered_content)
    return line

# Open the output file in write mode
with open(output_file, "w") as outfile:
    for file in markdown_files:
        # Open each markdown file in read mode
        with open(os.path.join(docs_dir, file), "r") as infile:
            content = infile.readlines()
            for line in content:
                # Replace the !include clause with the content from the referenced UML file
                replaced_line = replace_include_with_content(line)
                outfile.write(replaced_line)
        outfile.write("\n")

Script to generate Markdown files with embedded PlantUML diagrams for GitLab's PlantUML renderer

2 Answers2