Web Archives

Columbia University website, 2009

Columbia domain websites, that is, sites a columbia.edu address, have been crawled and preserved since 1996 and are available in our web archive collection via the Archive-It service. In addition, select sites without a columbia.edu address, for example those for publications and student groups, have been identified and are also captured in the web crawls. In 2015 we transitioned from a once-a-year crawl of the entire web domain in December to a semi-annual crawl in June and December. Certain pages, especially those containing information concerning course offerings, are crawled quarterly to make sure all course offerings for fall, spring and summer sessions are captured.

University Archives staff have also identified areas of the Columbia University website that would be of particular interest or use to researchers. Select webpages have been described and categorized within our Archive-It site in order to make it easier for a user to find sites related to the following groups: administration, alumni magazines, athletics, centers and institutes, course bulletins, honors and awards, libraries, schools and departments, student organizations, and student publications.

The most frequently requested information from the Columbia web archive is in the area of course descriptions. As schools moved course description information from annual published paper catalogues to online-only distribution, we have found that our web archiving efforts often provide the only publicly available access points to old, online course descriptions. Although not every course description for every school has been captured, one can usually find descriptions for courses offered by Columbia College, School of Engineering and Applied Science, School of General Studies, Graduate School of Arts and Sciences (GSAS), Graduate School of Architecture and Architectural Planning (GSAAP), and the School of International and Public Affairs (SIPA). If a school only posted online course descriptions in the CourseWorks system, that content was not able to be collected due to the secure nature of the site.

Please select the course bulletin grouping to help your search for course descriptions. For some of the older online course information, we recommend navigating from the archived Columbia University website to the appropriate school or department pages to look for online course descriptions.

If you encounter any dead links on our archived pages or have any further suggestions as to what should be captured or highlighted, please email uarchives@columbia.edu.

RBML's Instagram account (@columbia_rbml), March 2021

Archiving social media has always been imperfect: how do you capture dynamic content that is constantly changing and how do you navigate platforms that require users to log in to access any content? So far, the best approach is to manually archive the desired social media feeds directly as a logged-in user of the respective platforms. This way the look and feel of the platform is fully accessible. There are two good options for this approach.

Conifer is a free tool that used to be called Webrecorder, and still based at Rhizome (New Museum). With Conifer, you set up a free account, log in, and use their web interface to load specific web content that you want to archive in your browser, and then as you click around the tool archives each page/file that you click on. You can stop recording at any time, and immediately replay the content you have just archived. The data resides in their cloud account, but you can download it and save it locally as well. Importantly, archives created in Conifer can be downloaded in .warc format, which can be uploaded into our University Archives web collection.

The Chrome extension Archiveweb.page is an even lighter-weight version of this manual approach to web archiving, created by the same developer, Ilya Kremer. This tool is very easy to use: the user just has to add the Archiveweb.page extension to their Chrome browser. Once logged into Twitter, FB, Instagram, etc., go to the feed you wish to archive, and use the extension to archive exactly what you want by clicking on each relevant page/link in your browser. These captures can be stored as collections on the browser's device and can also be downloaded as .wacz or, preferably, .warc files, which can be uploaded into our University Archives web collection.

For more information, please see our guidelines on how to transfer digital records.

CUL - Main Content

Web Archives

Columbia University Archives

Web Archives

Archiving Social Media