个人博客的视觉回归测试

个人博客的视觉回归测试
Visual regression tests for personal blogs

原始链接: https://marending.dev/notes/visual-testing/

## 静态网站的视觉回归测试作者使用 Astro 和 MDX 构建静态网站，并在 CSS 更改后手动检查旧笔记，以确保没有意外的副作用。为了提高重构的信心，他们使用 Playwright 实现了视觉回归测试。 Playwright 自动化浏览器操作，拍摄指定页面的截图，并将其与“黄金”快照进行比较。超出设定的阈值的偏差会标记测试失败，从而提示修复或更新快照。这提供了与代码提交一起的历史站点外观记录。实现过程包括初始化 Playwright 项目，并创建一个测试套件，该套件遍历笔记页面列表，完全滚动每个页面以加载延迟加载的图像，然后捕获全页截图进行比较。作者优先考虑简单的流程，在可能产生重大影响的更改后手动运行测试，而不是在 CI/CD 中自动化。这种方法仅需几个小时的设置，但通过防止视觉回归并增强对站点修改的信心，增加了显著的价值。

一位 Hacker News 用户分享了在个人博客 (marending.dev) 上进行视觉回归测试的方案，使用了 Playwright 框架。作者认为这有助于捕捉细微且不易察觉的网站错误。另一位用户评论说他们目前使用 Visualping 和自定义脚本的组合，但希望直接集成截图功能。他们的挑战是自动识别截图差异中的*相关*变化——不仅仅是检测到*任何*变化——以便脚本集成。原始作者回复说他们依赖 Playwright 内置的差异化工具，该工具提供像素差异和跟踪的视觉概览。但是，他们不知道如何过滤掉不相关的变化以避免误报。Playwright 的跟踪查看器 ([https://playwright.dev/docs/trace-viewer-intro](https://playwright.dev/docs/trace-viewer-intro)) 被强调为一个关键特性。

原文

← back

06 Jan 2026

Gaining confidence in refactorings

This website is built using Astro to generate static pages. I author the notes themselves with mdx, a nice extension to markdown to include inline html and other components. The static html after building this site is then styled using some rather convoluted CSS. All this is to say that if I change, for example, a margin of a list item only if it precedes an image element, this may have unintended consequences on an older note I don’t look at often.

Whenever I do such changes, I find myself sampling older notes to see if something is broken. Lately, I’ve had the idea to use Playwright to do visual regression testing. For the uninitiated, this type of testing simply takes automated screenshots of pages (using a headless browser typically) and compares them against an earlier, considered golden, snapshot. Should the image deviate by more than some configurable threshold, the test is considered a failure. Then you either fix your application or in case of a legitimate change, you simply update the golden snapshot for that particular test.

These tests would come with the obvious upside of increasing confidence that changes don’t have unintended side-effects. Especially for a static website where the visual appearance is really all there is to it. But further, because I check the screenshots into the git repo, I get an automatic history of what the site looks like at the time of the commit.

Technical implementation

Playwright makes this quite easy out of the box. After initializing a new test project using npm init playwright@latest in the same repo as the website itself, I add this single test file:

import { test, expect } from "@playwright/test";

const notes = [
  "/",
  "/projects/",
  "/about/",
  "/notes/launchd/",
  "/notes/jour/",
  "/notes/reflective/",
  "/notes/monitoring/",
  "/notes/otel/",
  "/notes/llm/",
  "/notes/clickhouse/",
  "/notes/server-setup/",
  "/notes/jpeg-raw/",
  "/notes/go-rest-quest/",
  "/notes/responsive-plots/",
  "/notes/co2-loft/",
  "/notes/sqlite-vs-duckdb/",
  "/notes/unstructured-data/",
  "/notes/rest-quest/",
  "/notes/fieldnotes/",
  "/notes/rust-spa/",
  "/notes/16-hour-projects/",
  "/notes/wasm-benchmark/",
  "/notes/vps-benchmarks/",
  "/notes/sqlite-benchmarks/",
  "/notes/league-rating/",
  "/notes/league-data/",
  "/notes/co2-bedroom/",
  "/notes/esp-protocol/",
  "/notes/esp-power/",
  "/notes/performance/",
  "/notes/website/",
  "/feedback/",
];

test.describe("Visual regression", () => {
  const baseUrl = "https://marending.dev";

  for (const note of notes) {
    test(`capture page: ${note}`, async ({ page }) => {
      const url = `${baseUrl}${note}`;

      await page.goto(url);
      await page.waitForTimeout(200);

      const pageHeight = await page.evaluate(() => document.body.scrollHeight);

      for (let scrolled = 0; scrolled < pageHeight; scrolled += 200) {
        await page.mouse.wheel(0, 200);
        await page.waitForTimeout(200);
      }

      await page.waitForLoadState("networkidle");

      const screenshotName =
        note
          .replace(/^\//, "")
          .replace(/\/$/, "")
          .replace(/[^a-z0-9]/gi, "-")
          .toLowerCase() || "index";

      await expect(page).toHaveScreenshot(`${screenshotName}.png`, {
        fullPage: true,
      });
    });
  }
});

There are a couple of things to note here. The magic happens on the await expect(page).toHaveScreenshot line. This makes Playwright take a screenshot and compare it against a stored screenshot. If no screenshot with this name exists, it will fail the test and you have to first generate a screenshot by running your suite with --update-snapshots.

Second, there is some complication involved with taking full page screenshots. My website lazy-loads images, which means some images aren’t loaded when the page is sufficiently long. I wouldn’t care so much about that if it wasn’t flaky whether some images are loaded or not. I noticed that sometimes particular images were loaded and sometimes not, which defeats the purpose when trying to look for pixel differences between snapshots. For this purpose, you’ll notice the whole scrolling logic in the test: I scroll down the whole page 200 pixels at a time to ensure all images are loaded in.

Lastly, the list of pages I want to test are statically listed in the notes variable. At first, I actually generated this dynamically by programmatically visiting the index page and then extracting all linked targets. In the current design of the site, this yields exactly all subpages. Another way would be to expose an “endpoint” in the site that produces all the pages in the notes collection. Both approaches have the benefit of not requiring me to update the list manually when I publish a new note, but come with the downside that I need to execute all tests in a single Playwright test.

You see, this test-inside-for-loop you can see above only works as expected when the array to iterate over is statically known. In the dynamic approaches I can’t do that. And then you have to deal with the test failing once the first screenshot doesn’t match, instead of getting a nice summary in the case where each page is its own clean test.

Workflow

So how do I use this? It would be easy to over-engineer it and run this in CI periodically or build it into my deployment script. Instead, I decided to keep it simple and stupid. I have this setup with the images checked into the same repo as the website itself and I run the tests whenever I feel like I’ve made changes that could affect some other part of the site. There is no point in burning energy by running them on every commit or constantly failing my deployment just to confirm that changing a typo on a page does in fact cause visual changes.

With such simplicity in mind, it’s easy to add real value to my workflow with maybe 2 hours of effort. I need to do more things like it.