Wayback Machine and Cloudflare team up to archive more of the Web
The Internet Archive and Cloudflare have teamed up to archive the content of websites that use Cloudflare's Always Online service, increasing the odds that users will be able to view a recent version of a website during outages. The partnership will increase the number of webpages scanned by the Internet Archive, making the organization's Wayback Machine more useful to Internet users in general.
"Websites that enable Cloudflare's Always Online service will now have their content automatically archived, and if by chance the original host is not available to Cloudflare, then the Internet Archive will step in to make sure the pages get through to users," said an announcement by Mark Graham, director of the Internet Archive's Wayback Machine.
Cloudflare says its Always Online feature saves "a limited copy of your cached website to keep it online for your visitors" when the origin server is unavailable, ensuring that a website's "most popular pages are represented." Using the Wayback Machine will improve the Always Online service, Cloudflare CEO Matthew Prince said.
"The Internet Archive's Wayback Machine has an impressive infrastructure that can archive the Web at scale," Prince said.
The partnership will in turn improve the Wayback Machine's ability to archive the Web. The nonprofit Internet Archive's system doesn't crawl the entire Web but has made more than 468 billion archived webpages available and is adding over 1 billion new archived URLs a day, Graham wrote. It does this "via a variety of different methods, such as 'crawling' from lists of millions of sites, as submitted by users via the Wayback Machine's 'Save Page Now' feature, [websites] added to Wikipedia articles, referenced in Tweets, and based on a number of other 'signals' and sources, such [as] multiple feeds of 'news' stories," Graham explained.
Cloudflare's Always Online service is now one additional avenue for the Wayback Machine to find and archive websites. "As new URLs are added to sites that use that service they are submitted for archiving to the Wayback Machine," Graham wrote. "In some cases this will be the first time a URL will be seen by our system and result in a 'First Archive' event." In all cases, these newly archived URLs "will be available to anyone who uses the Wayback Machine."
Graham predicts that the partnership will let the Internet Archive do a "better job of backing up more of the public Web, and in so doing help make the Web more useful and reliable."
Users will get static webpages
Users who reach an archived version of a website when a server is offline will see only static pages. "Visitors who interact with dynamic parts of a website, such as a shopping cart or comment box, will sRead More – Source