(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=44072137

Hacker News 上的一篇帖子讨论了互联网档案(Internet Archive)的一项新功能,该功能允许观看者实时观看文档保存过程。讨论的重点是如何存档受 Cloudflare CAPTCHA 保护的网站,这被证明很难存档。一个用户问道 Cloudflare 是否与 archive.org 达成协议来解决这个问题,并提到了 Cloudflare 的“始终在线”(Always Online)功能,网站所有者必须手动启用此功能。对话随后转向 Cloudflare 在 Apple 设备上使用私有访问令牌作为抓取和绕过 CAPTCHA 的一种可能解决方案,用户讨论了这种方法对存档的潜在效用和局限性。一些人推测了速率限制,并指出存档请求的频率可能不会触发它们。总的来说,这篇帖子突出了存档采用反机器人措施的现代网站所面临的持续挑战和可能的解决方案。

相关文章
  • (评论) 2025-03-24
  • 2025-05-15
  • (评论) 2025-05-15
  • (评论) 2025-05-25
  • 2025-05-20

  • 原文
    Hacker News new | past | comments | ask | show | jobs | submit login
    Now you can watch the Internet Archive preserve documents in real time (theverge.com)
    162 points by LorenDB 2 days ago | hide | past | favorite | 10 comments












    I am part of an informal group involved in actively archiving websites, and the ones behind Cloudflare Captchas are barely archive-able. I presumed Cloudflare had a deal with Archive.org but I guess it went no where? https://blog.cloudflare.com/cloudflares-always-online-and-th...


    It's still a setting in their dashboard, but the site owner has to manually enable Always Online.


    Plenty of other archives around the world; one would hope any impediments to them doing their job due to Cloudflare would have a more general solution than a single partner.


    Are you using ios or macos to have access to private access tokens?

    https://blog.cloudflare.com/eliminating-captchas-on-iphones-...



    This looks like a useful solution for scraping. It doesn't prove you're a human, simply that you can afford to buy an iPhone. So buy the cheapest iPhone that supports this on eBay and then use that for scraping and archiving from now on.


    Given that these tokens are intentionally designed to distinguish human from bot traffic, I'd be surprised if they were (easily) available to archival tooling.


    The URLSession API supports private access tokens (it's handled for you automatically) while your app is foregrounded.

    https://developer.apple.com/documentation/foundation/urlsess...



    Oh, interesting! But I'd still expect these to be heavily rate limited etc. – otherwise, the people captcha-protected sites are hoping to keep out could just use these, right?


    At what rate are archivers solving Cloudflare challenges though? Probably not enough to hit any kind of rate limit. This is only used for the initial challenge and not for every request.






    Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



    Search:
    联系我们 contact @ memedata.com