Web Resources Collection Program

Web Archiving at Columbia University

Columbia University Libraries and Information Services (CUL) has expanded the scope of its collection development activities to include curated archival collections of freely available Internet resources. The development of this program, self-funded by CUL as of 2013, was made possible by generous support from the Andrew W. Mellon Foundation.

Columbia University Libraries' commitment to integrating web archiving into ongoing collection development and preservation best practice is informed by collaboration with other research libraries and the broader web archiving community. In 2012 CUL became a member of the International Internet Preservation Consortium (IIPC) and hosted a summit meeting for practitioners on Web Archiving Policies and Practices in the US. Columbia has also recently received new support from the Andrew W. Mellon Foundation for the explicit goal of fostering web archiving collaboration.  

Supporting Grant Projects

The Andrew W. Mellon Foundation has provided CUL funding for three grant projects in the area of web archiving:

Collection Building for Web Resources (2008-2009)

Objective: A joint project with the University of Maryland Libraries to develop and test coherent, holistic models for incorporating web content into research library collections.

Web Resources Collection Program Development (2009-2012)

Objective: To put into production procedures for selecting, acquiring, describing, preserving, and providing access to freely available web content, starting with the subject area of human rights and expanding into other thematic and Columbia-related content. 

Web Resources Archiving Collaboration (2013-2015)

Objective: To extend the effectiveness of Columbia‚Äôs web resource collecting program, and of the collective web archiving work within the US, by developing and testing models of collaboration with other research libraries, with scholars, with web content producers, and with other web archiving programs.  

Collections

The Web Resources Collection Program archives selected websites in thematic areas corresponding to existing CUL collection strengths, websites produced by affiliates of Columbia University, and websites from organizations or individuals whose papers or records are held in CUL's physical archives.

Specific collections of archived websites include:

Selection of Websites for Archiving

Subject Specialists at Columbia University Libraries work with the program's Web Resources Collection Coordinator to identify websites for archiving. For thematic collections we also invite website nominations from researchers and website owners. A variety of criteria drives our selection process, including relevance of subject matter to current research, teaching and advocacy, perceived risk of website longevity, and complementarity of websites with existing print collections held at Columbia University Libraries. Websites affiliated with Columbia University, and those of organizations whose print archives are held at Columbia will also be high priorities for archiving.

Permissions

The Web Resources Collection Program follows principles and techniques of non-intrusive harvesting. We attempt to notify all organizations and/or individuals whose websites are selected for archiving. We refrain from archiving websites that do not wish to be included in this project and will remove harvested content from the archive upon request by website owner(s). More information for website owners is available on our FAQ page.

Website Harvesting

Websites selected for our collections are harvested using the Archive-It service from the Internet Archive, which incorporates a version of the open source crawling software Heritrix. Depending on collection guidelines and the nature of individual websites, websites may be recaptured at regularly scheduled intervals, such as semi-annual or quarterly. 

Description and Access

Archived websites will remain freely available to the public via CUL's Archive-It partner page, where website-level metadata is added to allow browsing and full-text search. Additionally for some collections archived websites receive individual catalog records in CLIO (the online library catalog for Columbia University Libraries) and in OCLC's Worldcat database with links to both the live websites and the archived content. 

CUL has also developed an experimental local access portal for the Human Rights collection, the Human Rights Web Archive. This portal allows enhanced browsing and full-text search for archived human rights websites, and will be further developed to allow some searching of other human rights resources at Columbia and other archived human rights websites.

Contact Information

General program information, copyright inquiries culhrweb@libraries.cul.columbia.edu

Technical Information culhrweb-dev@libraries.cul.columbia.edu


Program Team

Stephen P. Davis, Director, Libraries Digital Program Division

Pamela Graham, Director, Center for Human Rights Documentation & Research

Kate Harcourt, Director, Original and Special Materials Cataloging

Anna Perricci, Web Archiving Project Librarian

Alex Thurman, Web Resources Collection Coordinator

Robert Wolven, Associate University Librarian for Bibliographic Services and Collection Development



WebArchiving

Internet connection graphic courtesy Chris Harrison, "Internet Map."