Website Archive Software

  

Compare the best free open source Windows Archiving Software at SourceForge. Free, secure and fast Windows Archiving Software downloads from the largest. The Web Archiving Lifecycle Model- The Web Archiving Lifecycle Model is an attempt to incorporate the technological and programmatic arms of the web archiving into a framework that will be relevant to any organization seeking to archive content from the web. Archive-It, the web archiving service from the Internet Archive, developed the model. HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. The LDS Web Archive captures, preserves, and make accessible LDS Church produced information published on the web. The web archive includes videos, tweets, and websites dating from 1996 to present. The Internet Archive’s Archive-it software is used to capture selected content. Some types of web content are difficult to capture and archive.

  1. Website Archive Tool
  2. Website Archive Software
Active9 months ago

We actually have burned static/archived copies of our asp.net websites for customers many times. We have used WebZip until now but we have had endless problems with crashes, downloaded pages not being re-linked correctly, etc.

We basically need an application that crawls and downloads static copies of everything on our asp.net website (pages, images, documents, css, etc) and then processes the downloaded pages so that they can be browsed locally without an internet connection (get rid of absolute urls in links, etc). The more idiot proof the better. This seems like a pretty common and (relatively) simple process but I have tried a few other applications and have been really unimpressed

Does anyone have archive software they would recommend? Does anyone have a really simple process they would share?

Vadim Kotov
5,5157 gold badges36 silver badges49 bronze badges
jskunklejskunkle
7263 gold badges10 silver badges23 bronze badges

9 Answers

In Windows, you can look at HTTrack. It's very configurable allowing you to set the speed of the downloads. But you can just point it at a website and run it too with no configuration at all.

In my experience it's been a really good tool and works well. Some of the things I like about HTTrack are:

  • Open Source license
  • Resumes stopped downloads
  • Can update an existing archive
  • You can configure it to be non-aggressive when it downloads so it doesn't waste your bandwidth and the bandwidth of the site.
Jesse DearingJesse Dearing
FelixSFDWebsite Archive Software
4,5747 gold badges32 silver badges95 bronze badges
Website Archive Softwarechuckgchuckg
5,9836 gold badges22 silver badges25 bronze badges

The Wayback Machine Downloader by hartator is simple and fast.

Install via Ruby, then run with the desired domain and optional timestamp from the Internet Archive.

jtheletterjtheletter
7,1294 gold badges36 silver badges45 bronze badges
SyntaxSyntax
1,1991 gold badge6 silver badges14 bronze badges

wget -r -k

... and investigate the rest of the options. I hope you've followed these guidelines:http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html so all your resources are safe with GET requests.

Joel HoffmanJoel Hoffman
Aram VerstegenAram Verstegen
2,2891 gold badge15 silver badges15 bronze badges

If your customers are archiving for compliance issues, you want to ensure that the content can be authenticated. The options listed are fine for simple viewing, but they aren't legally admissible. In that case, you're looking for timestamps and digital signatures. Much more complicated if you're doing it yourself. I'd suggest a service such as PageFreezer.

jtheletter
7,1294 gold badges36 silver badges45 bronze badges
DieghitoDieghito
4901 gold badge8 silver badges22 bronze badges

For OS X users, I've found the sitesucker application found here works well without configuring anything but how deep it follows links.

user1011743user1011743

Website Archive Tool

I've been using HTTrack for several years now. It handles all of the inter-page linking, etc. just fine. My only complaint is that I haven't found a good way to keep it limited to a sub-site very well. For instance, if there is a site www.foo.com/steve that I want to archive, it will likely follow links to www.foo.com/rowe and archive that too. Otherwise it's great. Highly configurable and reliable.

Website Archive SoftwareSteve RoweSteve Rowe
16.6k9 gold badges45 silver badges80 bronze badges

Website Archive Software

protected by Brad LarsonSep 25 '13 at 21:19

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged htmlweb-crawlerarchive or ask your own question.