Skip to content

ruzickap/action-my-broken-link-checker

Use this GitHub action with your project
Add this Action to an existing workflow or create a new one
View on Marketplace

Repository files navigation

GitHub Actions: My Broken Link Checker âś”

GitHub Marketplace license release GitHub release date GitHub Actions status

This is a GitHub Action to check for broken links in your static files or web pages. It uses muffet for the URL checking task.

See the basic GitHub Action example to run periodic checks (weekly) against mkdocs.org:

on:
  schedule:
    - cron: '0 0 * * 0'

name: Check markdown links
jobs:
  my-broken-link-checker:
    name: Check broken links
    runs-on: ubuntu-latest
    steps:
      - name: Check for broken links
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://www.mkdocs.org
          cmd_params: "--one-page-only --max-connections=3 --color=always"  # Check just one page

Check out the real demo:

My Broken Link Checker demo

This deploy action can be combined with Static Site Generators (Hugo, MkDocs, Gatsby, GitBook, mdBook, etc.). The following examples expect to have the web pages stored in the ./build directory. A caddy web server is started during the tests, using the hostname from the URL parameter and serving the web pages (see details in entrypoint.sh).

- name: Check for broken links
  uses: ruzickap/action-my-broken-link-checker@v2
  with:
    url: https://www.example.com/test123
    pages_path: ./build/
    cmd_params: '--buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --header="User-Agent:curl/7.54.0" --timeout=20'  # muffet parameters

Do you want to skip the Docker build step? OK, script mode is also available:

- name: Check for broken links
  env:
    INPUT_URL: https://www.example.com/test123
    INPUT_PAGES_PATH: ./build/
    INPUT_CMD_PARAMS: '--buffer-size=8192 --max-connections=10 --color=always --header="User-Agent:curl/7.54.0" --skip-tls-verification'  # --skip-tls-verification is mandatory parameter when using https and "PAGES_PATH"
  run: wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

Parameters

Environment variables used by ./entrypoint.sh script.

Variable Default Description
INPUT_CMD_PARAMS --buffer-size=8192 --max-connections=10 --color=always --verbose Command-line parameters for the URL checker muffet
INPUT_DEBUG false Enable debug mode for the ./entrypoint.sh script (set -x)
INPUT_PAGES_PATH Relative path to the directory with local web pages
INPUT_URL (Mandatory / Required) URL that will be checked

Example of Periodic checks

Pipeline for periodic link checking:

name: periodic-broken-link-checks

on:
  workflow_dispatch:
  push:
    paths:
      - .github/workflows/periodic-broken-link-checks.yml
  schedule:
    - cron: '3 3 * * 3'

jobs:
  broken-link-checker:
    runs-on: ubuntu-latest
    steps:

      - name: Setup Pages
        id: pages
        uses: actions/configure-pages@v3

      - name: Check for broken links
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: ${{ steps.pages.outputs.base_url }}
          cmd_params: '--buffer-size=8192 --max-connections=10 --color=always --header="User-Agent:curl/7.54.0" --timeout=20'

Full example

GitHub Action example:

name: Checks

on:
  push:
    branches:
      - main

jobs:
  build-deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Create web page
        run: |
          mkdir -v public
          cat > public/index.html << EOF
          <!DOCTYPE html>
          <html>
            <head>
              My page, which will be stored on the my-testing-domain.com domain
            </head>
            <body>
              Links:
              <ul>
                <li><a href="https://my-testing-domain.com">https://my-testing-domain.com</a></li>
                <li><a href="https://my-testing-domain.com:443">https://my-testing-domain.com:443</a></li>
              </ul>
            </body>
          </html>
          EOF

      - name: Check links using script
        env:
          INPUT_URL: https://my-testing-domain.com
          INPUT_PAGES_PATH: ./public/
          INPUT_CMD_PARAMS: '--skip-tls-verification --verbose --color=always'
          INPUT_DEBUG: true
        run: wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

      - name: Check links using container
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://my-testing-domain.com
          pages_path: ./public/
          cmd_params: '--skip-tls-verification --verbose --color=always'
          debug: true

Best practices

Let's try to automate the creation of web pages as much as possible.

The ideal situation requires a repository naming convention where the name of the GitHub repository matches the URL where it will be hosted.

GitHub Pages with custom domain

The mandatory part is the repository name awsug.cz, which is the same as the domain:

The web pages will be stored as GitHub Pages on their own domain.

The GitHub Action file may look like:

name: hugo-build

on:
  pull_request:
    types: [opened, synchronize]
  push:

jobs:
  hugo-build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Checkout submodules
        shell: bash
        run: |
          auth_header="$(git config --local --get http.https://github.com/.extraheader)"
          git submodule sync --recursive
          git -c "http.extraheader=$auth_header" -c protocol.version=2 submodule update --init --force --recursive --depth=1

      - name: Setup Hugo
        uses: peaceiris/actions-hugo@v2
        with:
          hugo-version: '0.62.0'

      - name: Build
        run: |
          hugo --gc
          cp LICENSE README.md public/
          echo "${{ github.event.repository.name }}" > public/CNAME

      - name: Check for broken links
        env:
          INPUT_URL: https://${{ github.event.repository.name }}
          INPUT_PAGES_PATH: public
          INPUT_CMD_PARAMS: '--verbose --buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --exclude="(mylabs.dev|linkedin.com)"'
        run: |
          wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v2/entrypoint.sh | bash

      - name: Check links using container
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://my-testing-domain.com
          pages_path: ./public/
          cmd_params: '--verbose --buffer-size=8192 --max-connections=10 --color=always --skip-tls-verification --header="User-Agent:curl/7.54.0" --exclude="(mylabs.dev|linkedin.com)"'
          debug: true

      - name: Deploy
        uses: peaceiris/actions-gh-pages@v3
        if: ${{ github.event_name }} == 'push' && github.ref == 'refs/heads/main'
        env:
          ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }}
          PUBLISH_BRANCH: gh-pages
          PUBLISH_DIR: public
        with:
          forceOrphan: true

The example is using Hugo.

GitHub Pages with github.io domain

The mandatory part is the repository name k8s-harbor, which is the directory part at the end of ruzickap.github.io:

In this example, the web pages will use GitHub's domain github.io.

name: vuepress-build-check-deploy

on:
  pull_request:
    types: [opened, synchronize]
    paths:
      - .github/workflows/vuepress-build-check-deploy.yml
      - docs/**
      - package.json
      - package-lock.json
  push:
    paths:
      - .github/workflows/vuepress-build-check-deploy.yml
      - docs/**
      - package.json
      - package-lock.json

jobs:
  vuepress-build-check-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Install Node.js 12
        uses: actions/setup-node@v1
        with:
          node-version: 12.x

      - name: Install VuePress and build the document
        run: |
          npm install
          npm run build
          cp LICENSE docs/.vuepress/dist
          sed -e "s@(part-@(https://github.com/${GITHUB_REPOSITORY}/tree/main/docs/part-@" -e 's@.\/.vuepress\/public\/@./@' docs/README.md > docs/.vuepress/dist/README.md
          ln -s docs/.vuepress/dist ${{ github.event.repository.name }}

      - name: Check for broken links
        uses: ruzickap/action-my-broken-link-checker@v2
        with:
          url: https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }}
          pages_path: .
          cmd_params: '--exclude=mylabs.dev --max-connections-per-host=5 --rate-limit=5 --timeout=20 --header="User-Agent:curl/7.54.0" --skip-tls-verification'

      - name: Deploy
        uses: peaceiris/actions-gh-pages@v3
        if: ${{ github.event_name }} == 'push' && github.ref == 'refs/heads/main'
        env:
          ACTIONS_DEPLOY_KEY: ${{ secrets.ACTIONS_DEPLOY_KEY }}
          PUBLISH_BRANCH: gh-pages
          PUBLISH_DIR: ./docs/.vuepress/dist
        with:
          forceOrphan: true

In this case I'm using VuePress to create my page.

GitHub Action my-broken-link-checker


Both examples can be used as a generic template, and you do not need to change them for your projects.

Running locally

It's possible to use the checking script locally. It will install Caddy and Muffet binaries if they are not already installed on your system.

export INPUT_URL="https://debian.cz/info/"
export INPUT_CMD_PARAMS="--buffer-size=8192 --ignore-fragments --one-page-only --max-connections=10 --color=always --verbose"
./entrypoint.sh

Output:

*** INFO: [2024-01-26 05:12:20] Start checking: "https://www.mkdocs.org"
https://www.mkdocs.org/
    200 https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js
    200 https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/languages/django.min.js
    200 https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/languages/yaml.min.js
    200 https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/styles/github.min.css
    200 https://github.com/mkdocs/catalog#-theming
    200 https://github.com/mkdocs/mkdocs/blob/master/docs/index.md
    200 https://github.com/mkdocs/mkdocs/wiki/MkDocs-Themes
    200 https://twitter.com/starletdreaming
    200 https://www.googletagmanager.com/gtag/js?id=G-274394082
    200 https://www.mkdocs.org/
    200 https://www.mkdocs.org/#mkdocs
    200 https://www.mkdocs.org/about/contributing/
    200 https://www.mkdocs.org/about/license/
    200 https://www.mkdocs.org/about/release-notes/
    200 https://www.mkdocs.org/about/release-notes/#maintenance-team
    200 https://www.mkdocs.org/assets/_mkdocstrings.css
    200 https://www.mkdocs.org/css/base.css
    200 https://www.mkdocs.org/css/bootstrap.min.css
    200 https://www.mkdocs.org/css/extra.css
    200 https://www.mkdocs.org/css/font-awesome.min.css
    200 https://www.mkdocs.org/dev-guide/
    200 https://www.mkdocs.org/dev-guide/api/
    200 https://www.mkdocs.org/dev-guide/plugins/
    200 https://www.mkdocs.org/dev-guide/themes/
    200 https://www.mkdocs.org/dev-guide/translations/
    200 https://www.mkdocs.org/getting-started/
    200 https://www.mkdocs.org/img/favicon.ico
    200 https://www.mkdocs.org/js/base.js
    200 https://www.mkdocs.org/js/bootstrap.min.js
    200 https://www.mkdocs.org/js/jquery-3.6.0.min.js
    200 https://www.mkdocs.org/search/main.js
    200 https://www.mkdocs.org/user-guide/
    200 https://www.mkdocs.org/user-guide/choosing-your-theme
    200 https://www.mkdocs.org/user-guide/choosing-your-theme/
    200 https://www.mkdocs.org/user-guide/choosing-your-theme/#mkdocs
    200 https://www.mkdocs.org/user-guide/choosing-your-theme/#readthedocs
    200 https://www.mkdocs.org/user-guide/cli/
    200 https://www.mkdocs.org/user-guide/configuration/
    200 https://www.mkdocs.org/user-guide/configuration/#markdown_extensions
    200 https://www.mkdocs.org/user-guide/configuration/#plugins
    200 https://www.mkdocs.org/user-guide/customizing-your-theme/
    200 https://www.mkdocs.org/user-guide/deploying-your-docs/
    200 https://www.mkdocs.org/user-guide/installation/
    200 https://www.mkdocs.org/user-guide/localizing-your-theme/
    200 https://www.mkdocs.org/user-guide/writing-your-docs/
*** INFO: [2024-01-26 05:12:21] Checks completed...

my-broken-link-checker-demo

Another example is checking a web page stored locally on your disk. In this case, I'm using the web page created in the ./tests/ directory from this Git repository:

export INPUT_URL="https://my-testing-domain.com"
export INPUT_PAGES_PATH="${PWD}/tests/"
export INPUT_CMD_PARAMS="--skip-tls-verification --verbose --color=always"
./entrypoint.sh

Output:

*** INFO: Using path "/home/pruzicka/git/action-my-broken-link-checker/tests/" as domain "my-testing-domain.com" with URI "https://my-testing-domain.com"
*** INFO: [2019-12-30 14:54:22] Start checking: "https://my-testing-domain.com"
https://my-testing-domain.com/
        200     https://my-testing-domain.com
        200     https://my-testing-domain.com/run_tests.sh
        200     https://my-testing-domain.com:443
        200     https://my-testing-domain.com:443/run_tests.sh
https://my-testing-domain.com:443/
        200     https://my-testing-domain.com
        200     https://my-testing-domain.com/run_tests.sh
        200     https://my-testing-domain.com:443
        200     https://my-testing-domain.com:443/run_tests.sh
*** INFO: [2019-12-30 14:54:22] Checks completed...

Examples

Some other examples of building and checking web pages using Static Site Generators and GitHub Actions can be found here: https://github.com/peaceiris/actions-gh-pages/.

The following links contain real examples of My Broken Link Checker: