In an effort to further support the wide variety of research and experimentation happening around open library data, the Columbia University Libraries has made our catalog data open and available for download. This data represents the bibliographic and holdings data stored in our Integrated Library System and accessible through CLIO, the Columbia University Libraries Online Catalog. The public discovery interface for these records can be found at http://clio.columbia.edu/catalog.
These catalog records were created, modified and managed over time, according to different cataloging standards and rules at a variety of developmental stages. As such, their quality and completeness will vary as a result of changes of practices and record sources. With this in mind, we offer this data in its current state, without remediation or review. Please read the information below for further details on the formats, currency and structure of this data. Note that this dataset does not include records from the Columbia University Law Library Catalog or ReCAP partner libraries (New York Public Library and Princeton University); in addition, some sets of vendor-supplied records have been removed from this open dataset at the request of those vendors.
This data includes bibliographic records for a wide range of materials (including but not limited to books, serials, music, videos, images, cartographic materials, manuscripts and archival collections) in print, microform, and electronic formats.
These datasets will be updated monthly. With each update, the structure, format and availability of records is liable to change. Please review the technical details page for further information.
Links to Downloads
Use the links below to download either version of the Columbia University Libraries Open Catalog Data.
The Columbia University Libraries offers this Columbia University Libraries Open Catalog Data (hereafter, ‘data’) available for public use following the CC0 1.0 Public Domain Designation. The Columbia University Libraries does request, where possible, that users of this data attribute the Columbia University Libraries as the source.
The records included in this dataset are available in the MARC 21 format (.mrc) or as MARCXML (.marcxml).
MARC stands for MAchine-Readable Cataloging, and it is a cataloging standard managed by the Library of Congress. MARC is a widely-used, international standard for sharing bibliographic data between libraries, archives, museums and more.
MARC Documentation & Tools
Below are some documentation sources and tools that can help inform and guide working with this MARC and MARCXML data. If you have your own resources or project that you would like to add to this page, please get in touch with us using the feedback form.
- Understanding MARC Bibliographic Machine-Readable Cataloging, 8th Edition, written by Betty Furrie and edited by the Library of Congress Network Development and MARC Standards Office: http://www.loc.gov/marc/umb/
- MARC Standards, Library of Congress, Network Development and MARC Standards Office: http://www.loc.gov/marc/
- A list of MARC specialized tools is being maintained by the Library of Congress:
- MARC/Perl: A Perl 5 Library for working with MARC records: http://marcpm.sourceforge.net/
- PyMarc: A Python Library for working with MARC records: https://github.com/edsu/pymarc/
Notes on Data Elements
The datasets contain various local elements. These unique MARC elements describe Columbia University Libraries’ holdings, and the local elements include data such record identifiers, library shelving locations, call numbers, and summary volume and chronology of serials and sets. There are also local elements representing various record enrichments such as vendor supplied contents and links to tables of contents. Item-specific data may appear in either bibliographic or holdings data, and it is labeled as such.
Electronic resources may contain URLs to the resource itself or related resources. These URLs may reside in a 856 or 920 MARC field. If both of these elements are present, then the 856 contains Columbia’s resolver URL and the 920 the original source URL.
Controlled name forms and subject headings are managed by Library Technologies, Inc. and may change over time.