Spring 2020 Seminars
The Georgia Tech Center for Music Technology Fall Seminar Series features both invited speakers as well as second-year student project presentations. The seminars are on Mondays from 1:55-2:45 p.m. in the West Village Dining Commons, Room 175, on Georgia Tech's campus and are open to the public. Below is the schedule for invited speakers and student presentations for Spring 2020:
January 6 - Richard Savery, PhD student
January 13 - Luke Herman, GTCMT Alum
January 20 - MLK Day (No Seminar)
January 27 - Mason Bretan, GTCMT Alum (Skype)
February 3rd - Ashis Pati, PhD student
Deep generative models have emerged as the tool of choice for automatic music generation and have been applied to several music generation tasks. However, a limitation of many of these models is that they typically work as black-boxes, i.e., the intended end-user (e.g., a music composer) has little to no control over the generation process. Additionally, they do not allow any interaction where the user can selectively modify the generated music or some of its parts based on desired musical characteristics or compositional goals. This talk will focus on some of my research on conditional music generation by designing deep generative models that provide control over certain musical attributes of interest.
The focus of the talk will be on latent representation-based deep generative models such as Variational Auto-Encoders (VAEs). which are known to encode certain hidden attributes of data. However, traditionally they don't provide explicit control over the semantically meaningful attributes. We will discuss two methods that allow us to leverage the structure of the latent spaces to either control musical attributes such as note density, rhythmic complexity, etc. or perform conditional generation given a certain musical context. Together, these methods will demonstrate the potential of using interpretable latent spaces for the design of intuitive interfaces for interactive music creation which could be used by end-users (artists, musicians, hobbyists, music and video creators) to augment and possibly enhance their creative workflows.
Ashis Pati is a Ph.D. candidate in the School of Music working with Professor Alexander Lerch. Ashis’s research interests lie at the intersection of audio signal processing, music information retrieval, and AI/ML. Specifically, he is interested in designing deep generative models capable of understanding and creating music. Born and raised in India, he received his undergraduate degree in Electrical Engineering (B.Tech, EE) from the Indian Institute of Technology, Kanpur in 2011, where his focus was on digital signal and image processing. Before joining Georgia Tech, he worked as an assistant project manager for ITC Limited handling large-scale energy and environmental projects. At GTCMT, Ashis has worked on designing algorithms and software for music education, music performance assessment, and musical scene analysis.
February 10th - Sidd Gururani, PhD student
Advancements in the field of Music Information Retrieval (MIR) have relied primarily on the success of supervised machine learning. One of the current challenges in MIR is the lack of large, fully-labeled datasets which are typically required to utilize the full potential of supervised deep neural networks. My research focuses on weakly supervised learning applied to a task in MIR known as musical instrument classification (MIC). MIC is the task of recognizing the presence or absence of one or more musical instruments in audio recordings of popular music. In this talk, we will discuss methods that tackle two challenges in MIC: (i) weakly labeled data (WLD), i.e., instrument labels are associated with entire 10 second clips as opposed to fine-grained annotations of instrumentation and (ii) missing labels in the data, i.e., instrument labels may be unknown.
Both these problems cause traditional supervised learning approaches to perform poorly. We discuss methods for weak supervision to address these challenges. First, we shall look at a multi-instance learning framework to handle WLDs and discuss an attention mechanism to adaptively aggregate fine-grained predictions to produce clip-level predictions. Second, we discuss semi-supervised learning to leverage missing labels to improve the performance of MIC models over models trained solely with labeled data. This involves jointly modeling the data and labels using generative models. Finally, we will look into self-supervised or unsupervised representation learning methods with large scale unlabeled datasets such as the Million Song Dataset to study the effectiveness of these methods for MIC. Aside from furthering the state-of-the-art in MIC, these methods will pave the way for researchers to utilize music data without the need for extensive and usually expensive annotations.
Siddharth Gururani is a PhD candidate in the Georgia Tech Center for Music Technology (GTCMT) supervised by Prof. Alexander Lerch. Sidd's research lies at the intersection of audio signal processing, machine learning and music information retrieval (MIR). His PhD thesis work focuses on weakly supervised learning algorithms for musical instrument classification. He has also worked on music sample detection, music performance assessment, speech recognition and text-to-speech. Prior to joining GT, he obtained a dual degree in Computer Science and Engineering (B.Tech and M.Tech, CSE) from the Indian Institute of Technology, Kharagpur in 2015, where he worked on hardware security and MIR.