Music and Metadata:The Musiconis Database and FAB-Musiconis Project

Date: October 29, 2019 - 1PM

Room: 203

Speaker: Susan Boynton, Department of Music (Columbia University)

Abstract: The Musiconis ( database presents and analyzes medieval
visual representations of musical performances (featuring instrumental musicians, singers,
and dancers) in artworks from the 8th to the 16th century. A metabase, Musiconis imports
records from existing databases, such as Gothic Ivories, and adds the music-related metadata
that is frequently either absent or inaccurate. Musiconis is structured around the ontology of a
musical scene
, so searches can be based on actions (how an instrument is being played) in
addition to the name of the instrument. The database came out of a long-term project that
brought together Paris-based music historians and art historians in the study of medieval
images of music. In 2016, Columbia joined the Paris team for FAB-Musiconis, a three-year
transatlantic exchange involving graduate students and faculty. Graduate student participants
learned how to create and edit records in the Musiconis metabase; an engineer improved
functionality in real time during working sessions. The Columbia University Libraries
supported the project through excellent seminars on metadata, digital humanities seminars,
guidance on learning outcomes, and access to spaces and collections.

Bio: Susan Boynton, Professor of Music, Historical Musicology

Linked Data Ecologies

Date: September 24, 2019 - 2PM

Room: 203 Butler Library

Speaker: Cristina Pattuelli

Abstract: Semantic technologies such as linked data have become increasingly popular in the library, archive and museum (LAM) community as a powerful means to enhance the visibility, discoverability and use of digital resources beyond the boundaries of institutional repositories and across heterogeneous domains. While there is a sense of urgency among cultural heritage institutions to be involved in linked data development, obstacles remain to full participation, including the need for specific technological know-how and the lack of intuitive tools for linked data production. This talk will discuss strategies and potential solutions based on a series of projects under development at the Semantic Lab at Pratt Institute. The Lab has pioneered a number of linked data methods and applications for different cultural contexts and materials, from the oral histories of Linked Jazz to the diaries of Mary Berenson and the working notes of Robert Rauschenberg. This talk will discuss recent developments, including DADAlytics — a data service aimed at lowering the barriers to the generation and publication of quality linked data.

Bio: Prof. Dr. Cristina Pattuelli

Data Science for Smart Culture: Telling Stories with Humans & Machines at Scale

Date: June 18, 2019 - 1PM

Room: 523 Butler Library

Speaker: Dr. Lora Aroyo

Abstract: Still there is various types of human knowledge that cannot yet be captured by machines, especially when dealing with wide ranges of real-world tasks and contexts. There is no single notion of truth, but rather a spectrum that has to account for context, opinions, perspectives and shades of grey. This is critical when we curate and describe online media collections, which will be consumed by variety of people with different intentions in numerous contexts. We need to focus on solutions that collect, harness and represent this multitude of perspectives in the descriptions of online audio-visual collections. Ultimately, harnessing the full spectrum of truth in a way that is scalable and adequate to real-world needs. Human Computation has begun to scientifically study how human intelligence at scale can be used to methodologically improve machine-based knowledge and data management.

My research focuses on understanding human computation for improving how machine-based systems can acquire, capture and harness human knowledge and thus become even more intelligent. I will present the CrowdTruth crowdsourcing framework ( that facilitates data collection, processing and analytics of human computation knowledge. CrowdTruth ( is a widely used crowdsourcing methodology adopted by industrial partners and public organizations, e.g. Google, IBM, New York Times, Crowdynews, The Netherlands Institute for Sound and Vision, in a multitude of domains, e.g. AI, news, medicine, social media, cultural heritage, social sciences. The central characteristic of CrowdTruth is harnessing the diversity in human interpretation to capture the wide range of opinions and perspectives, and thus, provide more reliable and realistic real-world annotated data for training and evaluating machine learning components. Unlike other methods, we do not discard dissenting votes, but incorporate them into a richer and more continuous representation of truth. Creating this more complex notion of truth contributes directly to the larger discussion on how to how to distinguish facts from opinions, perspectives and ultimately to make the Web more reliable, diverse and inclusive.

In this talk I will present use cases related to smart culture, e.g. enrichment of cultural heritage collections of artworks, videos, newspapers, etc for serendipitous discovery, creative thinking and human computation in the context of narrative building. This is illustrated with examples from smart culture, such as DIVE+ (, where humanities scholars explore and discover stories with cultural heritage objects from media collections online. DIVE+ is the result of a interdisciplinary collaboration between computer scientists, humanities scholars, cultural heritage professionals currently integrated in the Dutch national CLARIAH (Common Lab Research Infrastructure for the Arts and Humanities) research infrastructure.

Bio: Prof. Dr. Aroyo

Slides: Data Science for Smart Cultural Heritage

Towards Knowledge Graph based Representation, Augmentation and Exploration of Scholarly Communication

Date: May 7, 2019

Room: 203 Butler Library

Speaker: Soren Auer

Abstract: Despite an improved digital access to scientific publications in the last decades, the fundamental principles of scholarly communication remain unchanged and continue to be largely document-based. The document-oriented workflows in science have reached the limits of adequacy as highlighted by recent discussions on the increasing proliferation of scientific literature, the deficiency of peer-review and the reproducibility crisis. We need to represent, analyse, augment and exploit scholarly communication in a knowledge-based way by expressing and linking scientific contributions and related artefacts through semantically rich, interlinked knowledge graphs. This should be based on deep semantic representation of scientific contributions, their manual, crowd-sourced and automatic augmentation and finally the intuitive exploration and interaction employing question answering on the resulting scientific knowledge base. We need to synergistically combine automated extraction and augmentation techniques, with large-scale collaboration to reach an unprecedented level of knowledge graph breadth and depth. As a result, knowledge-based information flows can facilitate completely new ways of search and exploration. The efficiency and effectiveness of scholarly communication will significantly increase, since ambiguities are reduced, reproducibility is facilitated, redundancy is avoided, provenance and contributions can be better traced and the interconnections of research contributions are made more explicit and transparent. In this talk we will present first steps in this direction in the context of our Open Research Knowledge Graph initiative and the ScienceGRAPH project.

Bio: Prof. Dr. Auer