Subject Specialists at Columbia University Libraries work with the program's Web Resources Collection Coordinator to develop collection themes and identify specific websites for archiving. For thematic collections we also invite website nominations from researchers and website owners. A variety of criteria drives our selection process, including relevance of subject matter to current research, teaching and advocacy, perceived risk of website longevity, and complementarity of websites with existing print collections held at Columbia University Libraries. Websites affiliated with Columbia University, and those of organizations whose print archives are held at Columbia are also high priorities for archiving.


The Web Resources Collection Program follows principles and techniques of non-intrusive harvesting. We attempt to notify all organizations and/or individuals whose websites are selected for archiving. We refrain from archiving websites that do not wish to be included in this project and will remove harvested content from the archive upon request by website owner(s). More information for website owners is available on our FAQ page.


Websites selected for our collections are harvested using the Archive-It service from the Internet Archive, which incorporates versions of the open source crawling software programs Heritrix and Brozzler. Depending on collection guidelines and the nature of individual websites, websites may be recaptured at regularly scheduled intervals, such as semi-annual or quarterly. 


Archived websites will remain freely available to the public via CUL's Archive-It partner page, where website-level metadata is added to allow browsing and full-text search. Additionally for some collections archived websites receive individual catalog records in CLIO (the online library catalog for Columbia University Libraries) and in OCLC's Worldcat database with links to both the live websites and the archived content.