你不需要阿努比斯。
You Don't Need Anubis

原始链接: https://fxgn.dev/blog/anubis/

大型语言模型(LLM)训练公司,特别是ClaudeBot的开发者,正在积极地抓取网站,无视标准的“robots.txt”规则,并采用伪装和大量请求等策略。许多网站已经转向Anubis,一种工作量证明的反机器人保护,来对抗这种行为。然而,作者认为Anubis对于决心已定的LLM抓取器来说效果有限,因为扩展计算能力以绕过它的成本很低。 Anubis *看起来*有效的主要原因是大多数LLM机器人不执行JavaScript。通过JavaScript设置cookie的简单Caddyfile配置可以提供相当的保护,*而不会*像Anubis那样对合法用户造成显著的性能影响。 虽然Anubis可用于DDoS保护,但它经常被仅针对ClaudeBot的网站滥用。作者最终建议考虑Cloudflare(或类似服务)以获得可靠的反机器人保护,承认对其市场支配地位的担忧,但强调其有效性。如果Cloudflare不可用,基于JavaScript的解决方案通常比Anubis更适合大多数抓取问题。

相关文章

原文

In the past years, scrapers operated by LLM training companies have become more relentless. They no longer respect robots.txt, spoof their User Agents and IP addresses and even DDoS small sites with aggessive requests.

This has led to more and more websites using Anubis, a proof-of-work based bot protection solution that requires all visitors to solve a small cryptographic problem on their device before proceeding.

But here’s the thing: Anubis doesn’t work. Well, alright, it does - it can be a good DDoS protection solution, especially for people who don’t want to use Cloudflare. But it seems like most users of Anubis don’t need DDoS protection - only protection agaist agressive LLM scrapers. And if that’s your only usecase, you probably don’t need Anubis.

People often claim that Anubis stops bots by making it too computationally expensive to access your website. Unfortunately, the price LLM companies would have to pay to scrape every single Anubis deployment out there is approximately $0.00.

But it still works, right? People use Anubis because it actually stops LLM bots from scraping their site, so it must work, right?

Yeah, but only because the LLM bots simply don’t run JavaScript.

I recently selfhosted Redlib, and despite not sharing my instance with anyone, it got rate-limited by Reddit due to all the scraper bots trying to get that sweet Reddit content. Here is my solution to the issue, a 12-line Caddyfile:

domain.com {
    # Match all requests without a "verified" cookie"
    @unverified not header Cookie *verified*

    # Serve them a JS page that sets the cookie
    handle @unverified {
        header Content-Type text/html
        respond <<EOF
            <script>
            document.cookie = 'verified=1; Path=/;';
            window.location.reload();
            </script>
        EOF 418
    }

    # Proxy all other requests normally
    reverse_proxy localhost:3001
}

Yes, it works, and does so as effectively as Anubis, while not bothering your visitors with a 10-second page load time.

Sure, the bots may start running JS code one day, and then this will no longer work. But this also applies to Anubis - even people who use it often say that it’s just “a temporary stopgap until the bots learn how to bypass it”. So if you use a temporary solution anyway, why not use one that is practically invisible for your users?

Unfortunately, Cloudflare is pretty much the only reliable way to protect against bots. While the higher protection modes are still very annoying, especially to users on a VPN, there are some situations like actual DDoS attacks where you don’t have any other options. Even Anubis’ own README says:

In most cases, you should not need this and can probably get by using Cloudflare to protect a given origin. However, for circumstances where you can’t or won’t use Cloudflare, Anubis is there for you.

I get that many people are strongly against Cloudflare’s internet monopoly, and I don’t blame them for using solutions like Anubis to protect their sites against real attacks. I’m also obviously not trying to throw shade on the Anubis project or its developers. I think it has a use as a DDoS protection solution - it’s just extremely overused by people who don’t need it.

So if your only concern is ClaudeBot, which seems to be the case for most of the websites that use Anubis, please, go and replace your annoying stopgap solution with a non-annoying one.


you don't need anubis by flexagoon is marked with CC0 1.0Creative Commons LogoCC0 1.0 Badge

联系我们 contact @ memedata.com