I’ve been working in a GitHub documentation repo recently with many different markdown files and I kept noticing broken links. Usually they were my own fault, links within the repo I didn’t catch after moving, renaming, or otherwise reorganizing the content. Sometimes though they were external links that were valid at the time months ago but not anymore.

I wanted to find all the broken links, correct them, and help reduce them in the future; that would require some automation to be effective. A quick search lead me to this Markdown link check 🔗✔️ GitHub action.

Using the action is very straightforward; this is my initial action setup using it.

# Validates markdown links to check for bad / invalid / broken links.
# Uses mlc_config.json in root to configure patterns to ignore etc.
#
# https://github.com/marketplace/actions/markdown-link-check
#
name: Check Markdown links

# Just running manually and weekly. Can take a few minutes potentially.
on:
  workflow_dispatch:
  schedule:
  # Every Monday at 1p UTC https://crontab.guru/#0_13_*_*_1
  - cron: "0 13 * * 1"

jobs:
  markdown-link-check:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@master
    - uses: gaurav-nelson/github-action-markdown-link-check@v1
      with:
        config-file: 'mlc_config.json'

        # Quiet mode only shows errors in output not successful links too
        use-quiet-mode: 'yes'

        # Specify yes to show detailed HTTP status for checked links.
        use-verbose-mode: 'yes'

  • I used quiet mode as the number of successful links overwhelms the output and I’m mostly interested in the problems.
  • Verbose mode is used to get a more detailed error dump for links that can’t be resolved.
  • Initially I was running the action on every push but the action would take 2-10 minutes or so and the links didn’t need to be checked that aggressively.
  • The action has various other settings – it can only check modified markdown files for example.
  • I found the manual run dispatch and running once weekly on a schedule was a good middle ground trigger wise.
  • GitHub action schedule triggers are UTC so keep time zone conversion in mind for your local time.

Initially I ran the tool without a configuration file so there were no URL patterns to ignore and many ‘broken’ links. I say ‘broken’ as many of these links may be valid internal sites that GitHub.com can’t reach or public sites requiring authentication. Others may not require authentication but may return non-standard HTTP status codes when the URL is hit via a bot / automated process / outside of a user request in a browser.

By default the action looks for a config file named mlc_config.json in the repo root but a different filename can be given. URL patterns to ignore can be put here along with other other configuration options. I found the easiest method was copying unreachable URLs from the GitHub action output into a tool like regexr.com, testing a pattern there, then copying to the config file. A partial sample follows.

{
    "ignorePatterns": [
        {
            "pattern": "(.*\\.)?company-domain\\.com.*"
        },
        {
            "pattern": "(.*\\.)?dev.azure\\.com"
        },
        {
            "pattern": "(.*\\.)?github.com/company-org/.*"
        },
        {
            "pattern": "https://github.com/orgs/company-org/.*"
        },
        {
            "pattern": "(.*\\.)?.azurewebsites.net.*"
        },
        {
            "pattern": "10.0.4.(?:[4-9]|10)*."
        },
        {
            "pattern": "^(http|https)://localhost"
        },
        {
            "pattern": "^(http|https)://redis.io"
        },
        {
            "pattern": "^(http|https)://www.linkedin.com"
        },
        {
            "pattern": "^(http|https)://help.octopus.com"
        }
    ],
  "retryOn429": true,
  "aliveStatusCodes": [200, 206]
}

When there are broken links, Action output will look something like this.

With good URL ignore patterns in the configuration, the number of broken links should be minimal. I was quickly able to catch and correct at least a dozen invalid links after configuring the tool.

When all configured links are checked successfully:

It’s also helpful to add a workflow status badge to the repo’s README so the link check status is more visible than drilling into Actions.

The package is Treeware which I think is cool. They ask you buy the world a tree to thank them for their work.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.