Policies

Selection of Websites for Archiving

Subject Specialists at Columbia University Libraries work with the program's Head of Web Collecting to develop collection themes and identify specific websites for archiving. We also invite website nominations from researchers and website owners to be included in thematic collections. A variety of criteria drives our selection process, including relevance of subject matter to current research, teaching and advocacy, perceived risk of website longevity, and complementarity of websites with existing print collections held at Columbia University Libraries. Websites affiliated with Columbia University, and those of organizations whose print archives are held at Columbia are also high priorities for archiving.

Website Owners

The Web Resources Collection Program follows principles and techniques of non-intrusive crawling. We will honor requests by website owners to remove access to archived content from their sites.

Website Owner FAQ

Website Harvesting

Websites selected for our collections are harvested using the Archive-It service from the Internet Archive, which incorporates versions of the open source crawling software programs Heritrix and Brozzler. Depending on collection guidelines and the nature of individual websites, websites may be recaptured at regularly scheduled intervals, such as semi-annual or quarterly.

Description & Access

Archived websites will remain freely available to the public via Columbia University Library's Archive-It partner page, where website-level metadata is added to allow browsing and full-text search. Additionally, archived websites in the Human Rights and New York City Places and Spaces collections have individual catalog records in CLIO (the online library catalog for Columbia University Libraries) and in OCLC's Worldcat database, with links to both the live websites and the archived content.