1

How does the internet archive work? How can it emulate websites like that?

Comments
  • 4
    It's not emulating anything, it just shows you a snapshot of the page.
  • 4
    Wget <website.com>
    Save for later in a db with a timestamp.

    Render to user on request.
  • 0
    There's a library that archives everything based on references in the dom.

    It should be OSS, last time I checked anyway.
  • 0
    I found couble crawler projects under Internet Archive's Github account.

    https://github.com/internetarchive/...
    https://github.com/internetarchive/...

    Brozzler renders the page using a Chromium browses. From a first look I think it saves all the requests and responses to database for replaying. There might be some processing before the web page ends up in Wayback Machine, like changing hostnames.
Add Comment