A screen shot of code.

Symbolic Corpora Building

Symbolic Corpora Building

A fast-growing area of music scholarship is computational musicology. Music historians and theorists need access to digital representations of musical scores and performances to facilitate empirical approaches to their research. Unfortunately, at present there is a severe lack of symbolic musical data, and existing data is biased toward Baroque and Classical music, and largely consists of vocal and piano repertoire. In addition, because there is no single standard for the representation of symbolic data, existing datasets appear in many different formats, and with varying levels of accuracy and completeness. Work in this area seeks to address these issues, while also providing tools for the analysis of the symbolic data.



  • Condit-Schultz, N. & Ju, Y. & Fujinaga, I. (2018). "A Flexible Approach to Automated Harmonic Analysis: Multiple Annotations of Chorales by Bach and Prætorius," in Proceedings of the International Society of Music Information Retrieval (Paris, France).
  • Léveillé Gauvin, H., Condit-Schultz, N., Arthur, C. (2017). "Supplementing Melody, Lyrics, and Acoustic Information to the McGill Billboard Database," in DH2017: premiere annual conference of the international Alliance of Digital Humanities Organizations (Montreal, Canada).
  • Condit-Schultz, N. (2016). “The Musical Corpus of Flow: A Digital Corpus of Rap Transcriptions,” Empirical Musicology Review, 11(2): 124–146.
  • Devaney, J., Arthur, C., Condit-Schultz, N., & Nisula, K. (2015). “Theme And Variation Encodings with Roman Numerals (TAVERN): a New Data Set for Symbolic Music Analysis,” in Proceedings of the International Society of Music Information Retrieval (Málaga, Spain): 728–734.


If you can't find the information you were looking for, we'll get you to the right place.
Contact Us