Martin Stut's Blog

Random Notes about Information Technology

Kategorie: English

all English language posts

  • Using Local AI for Summarizing Long Texts

    In this post, I’ll describe how you can use GPT4All, an AI tool running locally on your computer, to create a reasonable first draft of meeting notes from a transcript. This tool is capable of processing text versions of meetings that have been created by tools like Whisper from audio recordings. By using GPT4All, the task of taking meeting notes becomes more manageable and efficient.

    You need to tweak the standard settings of GPT4All’s AI model in order to obtain a large enough context window so that your AI engine can analyze the entire transcript and not just the last portion of it.

    One-time Setup

    1. Install GPT4All, as outlined in an earlier blog post.
    2. Start GPT4All.
    3. Install the „Llama 3.1 8B Instruct 128k“ model: Models -> + Add Model -> search for „Llama 3.1 8B Instruct 128k“ -> Download
    4. Increase the context window:
      1. Chats -> load the „Llama 3.1 8B Instruct 128k“ model
      2. Settings -> Model -> Context Length: increase from default 2048 to 16384 (at least three times the number of words in your transcript; 8192 was not enough for a 3600 word transcript).
      3. Max Length: increase from 4096 to 8192 (this might not be needed if you want a short output).

    Steps per Meeting Transcript

    1. Start GPT4All, load the „Llama 3.1 8B Instruct 128k“ model and start a new chat.
    2. Copy & paste this prompt into the „Send a message“ field. At the end, do not press enter, instead press Shift+Enter twice to insert a blank line, without sending the message:
      Create a summary of the following meeting transcript. Use bullet points, include all topics, even those that were mentioned only briefly.
    3. Paste your meeting transcript into the message box, after the prompt and the newline. Then hit Enter to send the request and get a cup of water/tea/coffee while you wait for the reply.
    4. If the AI output has too few notes about some topics, ask follow-up questions like Please elaborate more on the conference in ... .
    5. Copy the reply (output) to your word processor and review whether AI got the facts right.

    Important: Review AI Output

    Review whether AI got the facts right. AI tends to hallucinate and this tendency is amplified by audio transcription errors, interrupted sentences etc.

    • AI might have reversed who is going to do what. For instance, when talking about an app, AI might think you are developing it, while in reality you are researching its usefulness and someone else would be doing the development.
    • AI might mix up or group unrelated events, like a conference that is half a year ahead and a visit to relatives next weekend.
    • In many places you’ll need to change „the speaker“ into the appropriate real name.

    Conclusion

    • AI can give you a good first draft of the meeting notes.
    • You still need to take some notes during the meeting:
      • important topics – AI might miss or inappropriately reduce a topic
      • key facts and numbers – speech-to-text AI might miss or misunderstand key words
      • key decisions – AI might misunderstand decision words as small remarks
    • Do your review shortly (hours, at most a few days) after the meeting, so you can remember key facts that AI may have mixed up.

  • How to Include a Mermaid Diagram into a WordPress Blog Post

    Why Mermaid?

    You can describe a diagram in simple text form, much easier (to me) than drawing it using a drawing program. You enter the relationships and let the program do the layout. I’m thinking of Mermaid as a diagramming tool in a way similar to what Markdown is for text authoring.

    For example, when you enter this code into a Mermaid interpreter

    graph TD
    mermaidsource["Mermaid Source Code"] --> wpedit["Word Press Editor, MerPress Block"] --> website[Finished Website]

    you’ll get a diagram like the one shown below.

    You can find an introduction to Mermaid on https://mermaid.js.org/intro/

    Approach 1: WordPress Plugin

    When searching the web, multiple options appear. The classic seems to have been „WP Mermaid“ but the page has disappeared from wordpress.org/plugins

    The plugin I chose is MerPress . Initially, I was a bit sceptical because it has only 100+ installations, which tends to indicate a scam or buggy sofware, but the development history shows continuous development since June 2021 and there are current updates for each WordPress version.

    When editing, you’ll see your code on the top and the resulting diagram below it. This is as comfortable as you can get. When you see no diagram, there is a syntax error in the code.

    Here is an example, from the code shown above:

    graph TD
    mermaidsource["Mermaid Source Code"] --> wpedit["Word Press Editor, MerPress Block"] --> website[Finished Website]

    Approach 2: Online-Editor + Image Export

    The Mermaid team has setup a Live Editor where you can input your diagram’s code and then see and download a PNG or SVG file. SVG is not easy in WordPress for security and privacy reasons (SVG is a variant of XML and can contain all kinds of links, including links to malware), so you are better off with a PNG image.

    But a PNG image is much larger than a mermaid file, so for your visitors, the plugin option is probably better, but the PNG option is a lot more portable because it does not require JavaScript execution on their computer.

  • Improve Your English Communication with AI-Powered Grammar Correction – A Guide for Non-Native Speakers

    Install GPT4All and let it rewrite your text by choosing a suitable model and prompt, as demonstrated in this article. GPT4All runs localy on your computer, so you can use this method for confidential data. (Updated for GPT4All 3.4.2, original version)

    Issue

    You speak and write some English, but not near native quality. However, you need to publish English texts, even if just as a chat message to a team. Since you cannot run each chat message by a native speaker friend because they have lots of other things to do, you are looking for machine help.

    Solution

    Let AI do the job. AI is said to occasionally hallucinate about facts and conclusions, but it’s good about grammar and spelling. So you create the content, and AI can adjust the words.

    Someone said, “AI is like a parrot and a text mechanic.” That’s what I’m trying to use it for. Comparable to a pocket calculator in Mathematics. Not a supercomputer or Wolfram Alpha, but a pocket calculator.

    Tool

    Because your content may not be public, e.g. intended for an internal message within your organization, you cannot use online services like Grammarly. Therefore, you must utilize a locally running AI. A colleague pointed me to GPT4All.io . Their smaller models run on computers with as little as 8 GB of RAM, while the larger models require 16 GB. You don’t need a GPU; the standard CPU of your desktop/notebook computer is sufficient. If you have the opportunity, run it on a Mac with ARM CPU. I found a 2021 M1 Mac Mini to be about 5 times faster than a Windows or Linux desktop with an Intel CPU. GPT4All supports all major operating systems: Linux, Mac, Windows.

    Initial one-time Setup

    Download GPT4All app

    1. Point your web browser to gpt4all.io
    2. Download the installer for your operating system type (Windows, Mac, Ubuntu). Ubuntu should also work for Debian, Mint and other Debian-derived systems.
    3. Install the downloaded package on your computer.
    4. Select “no” to the opt-in questions (after all, you are doing this to keep your text confidential).

    Download Models

    Note: Do this when you have good, unmetered Internet connectivity. The models are approximately 3-8 gigabytes each to download.

    1. Start GPT4All
    2. In the left pane, click on the “Models” icon.
    3. In the lower left corner, click on “Downloads”.
    4. In the top right corner click the „+ Add Model“ button.
    5. Identify the line of the “Mistral Instruct” model.
    6. In the right corner of the model’s line, click on the “Download” button. Wait until about 4.1 gigabytes have downloaded.

    If you want to experiment with other models, repeat step 6. You can also get models not in the list by entering a search term in the long top line „Discover and download models by keyword search…“

    If you want to download only one model, choose “Mistral Instruct”.

    Configure the App to Speed Up Your Text Checking Workflow

    1. Start GPT4All
    2. In the left navigation pane, click on the „Settings“ gear icon, then on the „Model“ line. A „Settings“ window opens.
    3. In the semi-left pane, click on the „Model” line. The header of the main pane is now “Model Settings”.
    4. Click on the unnamed line between „Model Settings“ and select “Mistral Instruct” from the dropdown
    5. Click on “Clone”
    6. Edit the “Unique Name” field to something like “Mistral Instruct Grammar Checker”
    7. Edit the „Prompt Template“ field to read (Please ensure that the prompt text is a single line. Although I divided the lines to improve readability, doing so has resulted in poorer outcomes.) :
    [INST]Please correct any grammatical or spelling errors in this text
    while staying as close to the original content as possible.
    Do not consider the context when making your corrections
    and only provide the revised version.
    
    %1 [/INST]
    
    • Optionally: empty the „Suggested FollowUp Prompt“
    • Close the Settings and the GPT4All Chat app

    Daily Use

    1. Open the GPT4All App
    2. In the left navigation pane, click on the „Chats“ icon.
    3. In the top center „Choose a model“ drop down field, select your newly configured model, e.g. “Mistral Instruct Grammar Checker”
    4. In the lower “Send a message” field, enter your sentence and click on the arrow in the right side.
    5. Check the output. If you believe that the AI output can be improved, or if you simply wish to view a different version, click on the ‘Regenerate response’ button.
    6. After 5-10 questions, the output tends to become less useful, like claiming spelling errors in the input while the output is identical to the input, start a „+ New Chat“ in the top of the semi-left navigation pane.

    To check another sentence or paragraph, just repeat from step 4.

    Conclusion

    GPT4All can help you improve your English communication by providing grammar and spelling corrections for your written text, always respecting the privacy of your data. This tool uses artificial intelligence to identify errors in your writing and suggest corrected versions. You can use this app daily as a part of your writing routine.

  • How to archive a CMS powered website to static HTML

    When switching the content management system of my website from ProcessWire to WordPress, I want to archive the previous website, because it contains some content that I want to keep and stay accessible.

    In this post I’ll describe how to do it. Essentially, it is one single wget command with optional post-processing by sed .

    For a real-world example, my goal is to create a collection of locally browsable HTML files from https://processwire2015.stut.de . I want to shut down that CMS but still publish the old website under a subdomain.

    I’m doing this on a Linux (Ubuntu 22.04, but most Linuxes, as well as macOS, have the tools mentioned) command line. The go-to tool for ripping an entire website in Linux is wget . (wget is also available for Windows, see https://gnuwin32.sourceforge.net/packages/wget.htm and https://www.tomshardware.com/how-to/use-wget-download-files-command-line .)

    I’m creating a blank subdirectory ~/processwire2015-static . All work is being done in this subdirectory.

    1. Collect the Right wget Parameters

    wget is a very versatile tool with lots of options. So I went through the documentation and collected the parameters for my use case:

    • follow links : -r
    • … but only within this domain: --domains=processwire2015.stut.de,www.stut.de,stut.de
    • write relative links, suitable for local viewing (as the original will go away): --convert-links
    • create files as .html, even is the URL ends with something else: --adjust-extension
    • also get page requisites like CSS: --page-requisites
    • don’t create subdirectories per host, to avoid the font being stored in a separate subdirectory: --no-host-directories

    so the complete command line is

    wget -r --domains=processwire2015.stut.de,www.stut.de,stut.de --no-host-directories --convert-links --adjust-extension --page-requisites https://processwire2015.stut.de

    This runs for a minute or so and generates a collection of files and subdirectories. The number of separate index.html files comes from the URL composition of ProcessWire: For instance, the „english“ page has the URL /english , which is not a good filename. So wget creates a subdirectory of this name and an index.html file with the real contents.

    The Linux tree command lists a nice overview:

    .
    ├── css?family=Lusitana:400,700|Quattrocento:400,700.css
    ├── deutsch
    │   ├── bueroservice-marion-stut
    │   │   └── index.html
    │   ├── dienstleistung
    │   │   └── index.html
    │   ├── erfahrungen
    │   │   └── index.html
    │   ├── index.html
    │   └── lebenslauf
    │       └── index.html
    ├── english
    │   ├── cv
    │   │   └── index.html
    │   ├── experience
    │   │   └── index.html
    │   ├── index.html
    │   └── services
    │       └── index.html
    ├── index.html
    ├── kontakt-impressum
    │   ├── datenschutzerklaerung
    │   │   └── index.html
    │   └── index.html
    ├── links
    │   └── index.html
    ├── site
    │   ├── assets
    │   │   └── files
    │   │       ├── 1025
    │   │       │   └── passbild-martin-web.jpg
    │   │       └── 1034
    │   │           └── passbild-martin-web.jpg
    │   └── templates
    │       └── styles
    │           └── main.css
    └── site-map
        └── index.html
    

    2. Postprocessing – Cleanup with sed

    Inevitably, some manual cleanup will be needed.

    Remove the Admin Login Page Link Target

    All pages generated by the old CMS had a link „Admin Login“ at the bottom, pointing to the /admin page. As the CMS will go away it is pointless to have a link to the no-longer-existing admin page. So let’s point the link to the start page.

    As all files need to be edited, this task cries for automation. One of Linux’s go-to tools for mass replacement of text file content is sed . (awk is another tool capable of this, but sed is a lot simpler and I wanted to learn it at this opportunity.)

    Because I haven’t used sed for quite a time, I searched the web for examples and found https://tecadmin.net/sed-command-in-linux-with-examples/ . For a single file the command line is:

    sed -i 's/adminpage//' index.html

    The -i option copies the result in-place to the file being edited, as opposed to writing it to standard output. The s/adminpage// command does a substitution of the regular expression adminpage to the empty string (between the last two slashes). The single quotes just serve to protect the command from being expanded by the shell.

    To apply this command to all files, I’m using the find command:

    find . -type f -exec sed -i 's/adminpage//' {} \;

    The . means „start at this directory“; type -f means „only look at files“ as opposed to directories, symbolic links, devices, …; -exec means „execute the following command for each file found“; the {} in the command is a placeholder to insert the filename to be processed. The \; at the end is a plain semicolon to signify the end of the -exec command, with the backslash protecting the semicolon from the outer shell.

    Then of course remove the /adminpage directory.

    Remove the Admin Login Page Link Text

    The same sed logic does the trick here, replacing „Admin Login“ with nothing:

    find . -type f -exec sed -i 's/Admin Login//' {} \;

    Remove the Search Form Action

    The old CMS‘ template also placed a search field (miniature form) onto each page. This search form won’t work with a static copy. Because it would have been too much effort to remove the search form, I decided to just remove the action: replace https://processwire2015.stut.de/search/ with nothing:

    find . -type f -exec sed -i 's/https:\/\/processwire2015.stut.de\/search\///' {} \;

    The sed command is suffering from „leaning toothpick syndrome“, because the literal forward slashes in the search text need to be protected by preceding backslashes from being interpreted as „end of regular expression“ marks.

    Conclusion

    You have now an adaptable recipe for archiving an entire CMS-driven website to a static collection of HTML files using the wget command, and postprocessing it using the sed command.

    This example task demonstrates some of the power of the command line. Explaining the same process for a graphical user interface would be much more complex.