Martin Stut's Blog

Random Notes about Information Technology

Schlagwort: Web Technology

everything related to web technology

  • How to Include a Mermaid Diagram into a WordPress Blog Post

    Why Mermaid?

    You can describe a diagram in simple text form, much easier (to me) than drawing it using a drawing program. You enter the relationships and let the program do the layout. I’m thinking of Mermaid as a diagramming tool in a way similar to what Markdown is for text authoring.

    For example, when you enter this code into a Mermaid interpreter

    graph TD
    mermaidsource["Mermaid Source Code"] --> wpedit["Word Press Editor, MerPress Block"] --> website[Finished Website]

    you’ll get a diagram like the one shown below.

    You can find an introduction to Mermaid on https://mermaid.js.org/intro/

    Approach 1: WordPress Plugin

    When searching the web, multiple options appear. The classic seems to have been „WP Mermaid“ but the page has disappeared from wordpress.org/plugins

    The plugin I chose is MerPress . Initially, I was a bit sceptical because it has only 100+ installations, which tends to indicate a scam or buggy sofware, but the development history shows continuous development since June 2021 and there are current updates for each WordPress version.

    When editing, you’ll see your code on the top and the resulting diagram below it. This is as comfortable as you can get. When you see no diagram, there is a syntax error in the code.

    Here is an example, from the code shown above:

    graph TD
    mermaidsource["Mermaid Source Code"] --> wpedit["Word Press Editor, MerPress Block"] --> website[Finished Website]

    Approach 2: Online-Editor + Image Export

    The Mermaid team has setup a Live Editor where you can input your diagram’s code and then see and download a PNG or SVG file. SVG is not easy in WordPress for security and privacy reasons (SVG is a variant of XML and can contain all kinds of links, including links to malware), so you are better off with a PNG image.

    But a PNG image is much larger than a mermaid file, so for your visitors, the plugin option is probably better, but the PNG option is a lot more portable because it does not require JavaScript execution on their computer.

  • WordPress Site bei Hetzner aufsetzen

    Beim Umstellen meiner Website auf ein Blogging-freundliches CMS fiel die Wahl auf WordPress. Mein bevorzugter Hosting-Provider Hetzner bietet an, WordPress direkt aus seiner Verwaltungsoberfläche Konsole-H zu installieren.
    Hier sind meine Notizen dieses Vorgangs:

    Vorbereitungen

    • ein Verzeichnis für die neue WordPress-Installation anlegen, z.B. mit Web-FTP, z.B. „stut-de_wp2024“
    • für die Domain TLS einrichten. Hetzner kann auf „Knopfdruck“ Wildcard-Zertifikate einrichten, die dann z.B. für „*.stut.de“ gültig sind. Das erspart die Mühe, für jede Subdomain ein separates TLS-Zertifikat einzurichten.
    • ein Postfach „wordpress“ installieren, das als Absender verwendet wird. Hetzner prüft die Existenz dieses Postfachs bevor man mit der Installation fortfahren kann.
    • Die PHP-Version auf eine aktuelle einstellen. Bei mir stand es noch auf 7.3. Ich habe auf 8.3 (das neueste das derzeit angeboten wird) umgestellt.
    • PHP-Modul imagick aktivieren. Wenn das nicht aktiv ist, klagt WordPress im Bereich „Zustand der Webseite“ über dessen Fehlen. Wenn ich gerade dabei bin: auch noch das PHP-Modul OpCache aktivieren, das den PHP-Interpreter beschleunigt.

    Konsole-H > Produkte > Einstellungen > Extras > WordPress

    1. Zielverzeichnis angeben (s.o.)
    2. Die Automatik richtet selbstständig eine neue MySQL/MariaDB-Datenbank ein. Es empfiehlt sich, den Namen dieser Datenbank zu notieren. Die Zugangsdaten kann man hinterher im Bereich Datenbanken > MySQL einsehen. Dort kann man auch eine Beschreibung der Datenbank hinterlegen, was sich empfiehlt um in 10 Jahren noch zu wissen, wozu diese Datenbank mal gedient hat oder vielleicht immer noch dient.
    3. Klick auf „WordPress einrichten“:
      • Titel: z.B. „Martin Stut’s Blog“
      • Benachrichtigungs-Email: die Adresse, an die z.B. Meldungen über erforderliche/erfolgte Updates oder Kommentare zum Freigeben geschickt werden.
      • Benutzername/Passwort setzen – das umgeht das Problem, das ganz frische WordPress-Installationen (für Angreifer erkennbar an einem frisch angelegten TLS-Zertifikat) in Sekundenschnelle gehackt werden, noch bevor der echte Admin sein Passwort setzt.
      • Speichern
    4. „WordPress wird eingerichtet“

    In WordPress …/wp-admin anmelden

    Mit den Zugangsdaten, die oben gesetzt wurden.

    „Zustand der Webseite“ überprüfen. Da können PHP-Module fehlen (-> in Konsole-H einschalten) oder Updates nötig sein.

    Der Rest des Einrichtungsvorgangs findet dann innerhalb WordPress statt und ist nicht mehr Hetzner-spezifisch.

  • How to archive a CMS powered website to static HTML

    When switching the content management system of my website from ProcessWire to WordPress, I want to archive the previous website, because it contains some content that I want to keep and stay accessible.

    In this post I’ll describe how to do it. Essentially, it is one single wget command with optional post-processing by sed .

    For a real-world example, my goal is to create a collection of locally browsable HTML files from https://processwire2015.stut.de . I want to shut down that CMS but still publish the old website under a subdomain.

    I’m doing this on a Linux (Ubuntu 22.04, but most Linuxes, as well as macOS, have the tools mentioned) command line. The go-to tool for ripping an entire website in Linux is wget . (wget is also available for Windows, see https://gnuwin32.sourceforge.net/packages/wget.htm and https://www.tomshardware.com/how-to/use-wget-download-files-command-line .)

    I’m creating a blank subdirectory ~/processwire2015-static . All work is being done in this subdirectory.

    1. Collect the Right wget Parameters

    wget is a very versatile tool with lots of options. So I went through the documentation and collected the parameters for my use case:

    • follow links : -r
    • … but only within this domain: --domains=processwire2015.stut.de,www.stut.de,stut.de
    • write relative links, suitable for local viewing (as the original will go away): --convert-links
    • create files as .html, even is the URL ends with something else: --adjust-extension
    • also get page requisites like CSS: --page-requisites
    • don’t create subdirectories per host, to avoid the font being stored in a separate subdirectory: --no-host-directories

    so the complete command line is

    wget -r --domains=processwire2015.stut.de,www.stut.de,stut.de --no-host-directories --convert-links --adjust-extension --page-requisites https://processwire2015.stut.de

    This runs for a minute or so and generates a collection of files and subdirectories. The number of separate index.html files comes from the URL composition of ProcessWire: For instance, the „english“ page has the URL /english , which is not a good filename. So wget creates a subdirectory of this name and an index.html file with the real contents.

    The Linux tree command lists a nice overview:

    .
    ├── css?family=Lusitana:400,700|Quattrocento:400,700.css
    ├── deutsch
    │   ├── bueroservice-marion-stut
    │   │   └── index.html
    │   ├── dienstleistung
    │   │   └── index.html
    │   ├── erfahrungen
    │   │   └── index.html
    │   ├── index.html
    │   └── lebenslauf
    │       └── index.html
    ├── english
    │   ├── cv
    │   │   └── index.html
    │   ├── experience
    │   │   └── index.html
    │   ├── index.html
    │   └── services
    │       └── index.html
    ├── index.html
    ├── kontakt-impressum
    │   ├── datenschutzerklaerung
    │   │   └── index.html
    │   └── index.html
    ├── links
    │   └── index.html
    ├── site
    │   ├── assets
    │   │   └── files
    │   │       ├── 1025
    │   │       │   └── passbild-martin-web.jpg
    │   │       └── 1034
    │   │           └── passbild-martin-web.jpg
    │   └── templates
    │       └── styles
    │           └── main.css
    └── site-map
        └── index.html
    

    2. Postprocessing – Cleanup with sed

    Inevitably, some manual cleanup will be needed.

    Remove the Admin Login Page Link Target

    All pages generated by the old CMS had a link „Admin Login“ at the bottom, pointing to the /admin page. As the CMS will go away it is pointless to have a link to the no-longer-existing admin page. So let’s point the link to the start page.

    As all files need to be edited, this task cries for automation. One of Linux’s go-to tools for mass replacement of text file content is sed . (awk is another tool capable of this, but sed is a lot simpler and I wanted to learn it at this opportunity.)

    Because I haven’t used sed for quite a time, I searched the web for examples and found https://tecadmin.net/sed-command-in-linux-with-examples/ . For a single file the command line is:

    sed -i 's/adminpage//' index.html

    The -i option copies the result in-place to the file being edited, as opposed to writing it to standard output. The s/adminpage// command does a substitution of the regular expression adminpage to the empty string (between the last two slashes). The single quotes just serve to protect the command from being expanded by the shell.

    To apply this command to all files, I’m using the find command:

    find . -type f -exec sed -i 's/adminpage//' {} \;

    The . means „start at this directory“; type -f means „only look at files“ as opposed to directories, symbolic links, devices, …; -exec means „execute the following command for each file found“; the {} in the command is a placeholder to insert the filename to be processed. The \; at the end is a plain semicolon to signify the end of the -exec command, with the backslash protecting the semicolon from the outer shell.

    Then of course remove the /adminpage directory.

    Remove the Admin Login Page Link Text

    The same sed logic does the trick here, replacing „Admin Login“ with nothing:

    find . -type f -exec sed -i 's/Admin Login//' {} \;

    Remove the Search Form Action

    The old CMS‘ template also placed a search field (miniature form) onto each page. This search form won’t work with a static copy. Because it would have been too much effort to remove the search form, I decided to just remove the action: replace https://processwire2015.stut.de/search/ with nothing:

    find . -type f -exec sed -i 's/https:\/\/processwire2015.stut.de\/search\///' {} \;

    The sed command is suffering from „leaning toothpick syndrome“, because the literal forward slashes in the search text need to be protected by preceding backslashes from being interpreted as „end of regular expression“ marks.

    Conclusion

    You have now an adaptable recipe for archiving an entire CMS-driven website to a static collection of HTML files using the wget command, and postprocessing it using the sed command.

    This example task demonstrates some of the power of the command line. Explaining the same process for a graphical user interface would be much more complex.