Track Broken Links and 404 Events in Jekyll Sites

The Importance of Monitoring 404 Errors

Broken links and 404 errors silently hurt your user experience and SEO. On dynamic platforms, server logs provide insight, but on static Jekyll sites hosted on GitHub Pages, there’s no backend to collect logs. This means you need to take a different approach to capture when users hit a 404 page or encounter a broken link.

Why Track 404 Events?

  • Identify missing pages after restructuring or deleting content
  • Detect bad backlinks from other sites
  • Fix internal link issues before they affect SEO
  • Understand user behavior through failure points

Client-Side Tracking with Google Analytics

The most accessible way to track 404 errors on a Jekyll site is through JavaScript-based analytics like Google Analytics. You can fire a custom event when the 404 page is loaded.

Example Using Google Analytics 4

<script>
  window.dataLayer = window.dataLayer || [];
  window.dataLayer.push({
    event: '404_error',
    error_url: window.location.href,
    error_referrer: document.referrer
  });
</script>

This code can be placed in your 404.html file. You’ll need to create a tag and trigger in Google Tag Manager to log this event correctly.

Using Plausible Analytics for Lightweight Tracking

If you prefer privacy-first analytics, Plausible offers a clean and simple way to track 404 visits without loading large scripts.

Setup Example

In your 404.html, add this snippet:

<script defer data-domain="yourdomain.com" src="https://plausible.io/js/script.js"></script>
<script>
  plausible("404", {
    props: {
      path: window.location.pathname,
      referrer: document.referrer
    }
  });
</script>

Each 404 hit will now appear in your Plausible dashboard, including what page the visitor came from and which URL failed.

Logging 404s with Netlify Functions

Though GitHub Pages doesn’t support server-side functions, you can route your 404 pages through Netlify by mirroring your site there. With Netlify’s serverless functions, you can log 404 events to a database or send them via webhook.

Basic Logging Workflow

  • User hits 404 page
  • JavaScript sends fetch request to a Netlify function endpoint
  • Function writes data to Google Sheets or Airtable

Example Client-Side Call

<script>
fetch("https://yoursite.netlify.app/.netlify/functions/log404", {
  method: "POST",
  body: JSON.stringify({
    url: window.location.href,
    referrer: document.referrer,
    time: new Date().toISOString()
  }),
  headers: {
    "Content-Type": "application/json"
  }
});
</script>

Detecting Broken Links During Development

Prevention is the best fix. Before pushing your Jekyll site live, check for broken links using automated tools.

Recommended Tools

  • HTMLProofer: A Ruby gem that scans your generated site
  • Broken Link Checker browser extension
  • Screaming Frog SEO Spider for deep site crawling
  • Ahrefs or Semrush for external link audit

Running HTMLProofer

bundle exec jekyll build
bundle exec htmlproofer ./_site

This command will scan your built site for broken links and report them with line numbers and file paths.

Visualizing 404 Data for Trends

Once you start tracking 404s, you can aggregate and visualize them to uncover patterns. Use tools like:

  • Google Looker Studio (formerly Data Studio) for GA4 events
  • Plausible dashboards for top broken paths
  • Airtable or Google Sheets for manual review

Key Metrics to Watch

  • Most frequent 404 URLs
  • Top referrers to broken pages
  • Time spent on 404 pages (dwell time)
  • Navigation patterns after hitting 404

Case Study: Cleaning Up Legacy URLs

A blog migrated from WordPress to Jekyll experienced a spike in 404 errors from external backlinks. Using Plausible to monitor 404 traffic, the site owner identified and redirected over 50 legacy URLs to new equivalents using Jekyll’s redirect_from plugin. This not only improved SEO metrics but recovered lost link equity and boosted user retention.

Tips for Ongoing Maintenance

  • Set up regular crawls with HTMLProofer or Screaming Frog
  • Monitor your analytics platform’s 404 events monthly
  • Check Google Search Console for crawl errors and broken links
  • Keep a changelog of content removals or slug changes

Conclusion

Even without server-side access, Jekyll site owners can effectively monitor and track 404 errors using modern tools and clever client-side scripts. Doing so protects your SEO, improves user trust, and keeps your content ecosystem clean and navigable. Whether you use Google Analytics, Plausible, or serverless logs, the key is consistency and quick action on what the data reveals.