Website Owner FAQ

Do website owners have to change or alter websites to be included in the crawls?

No, website owners do not have to change the content, structure, or appearance of their websites to be included in the crawls. Website design can affect the completeness and accuracy of the archival captures, however; following these Guidelines for Preservable Websites helps ensure successful web archiving.

Will your crawling interfere with access to our website?

We crawl websites at a polite rate so as not to interfere with access to your website. Crawls will generally be run quarterly or semi-annually for actively updated websites, and last for a few days. Once a crawl is complete, the crawler no longer interacts with your server.

If you encounter any issues or have any additional questions, please contact us at webarchiving@library.columbia.edu.

Are you able to capture media, audio, and video files?

Yes, downloadable media, audio and video files can usually be captured, although videos hosted on third-party services like YouTube or Vimeo can be challenging. Our crawler follows links in order to discover and capture content, so links to content must exist on a website in order for that content to be included in the archive. We can’t capture files that are not linked and have to be retrieved from a database via user query. (For example, a publications database that requires one to execute a search in order to access publications.) Streaming audio and video can’t be captured at all by the current generation of web crawlers.

How can I view the websites that have been archived? Will access always be free?

Archived websites will remain freely accessible to the public. Websites can be viewed by date of capture via our Internet Archive partner page. Additional means of viewing archived websites will be explored by program staff.

Why do the archived versions of some websites appear to be incomplete?

There are several reasons why an archived website may appear to be incomplete. Some types of content are challenging or impossible to capture and/or reproduce, including JavaScript-driven navigation menus, streaming audio and video, and dynamic form and database-driven content. We can’t capture files that are not linked and have to be retrieved from a database via user query. (For example, a publications database that requires one to execute a search in order to access publications.) Also, portions of a website may be restricted or password-protected. We will only collect public content, so password protected material will not be crawled. Site owners looking to optimize their site design to allow full archiving should review the Guidelines for Preservable Websites.

I would like my organization's website to be removed from the web resources collection. Who do I contact?

We will honor requests to remove archived content. Please contact webarchiving@library.columbia.edu.