Item

Deep perceptual embeddings for unlabelled animal sound events

Stowell,Dan
Abstract
Evaluating sound similarity is a fundamental building block in acoustic perception and computational analysis. Traditional data-driven analyses of perceptual similarity are based on heuristics or simplified linear models, and are thus limited. Deep learning embeddings, often using triplet networks, have been useful in many fields. However, such networks are usually trained using large class-labelled datasets. Such labels are not always feasible to acquire. We explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilising proxies of perceptual similarity judgements. Ultimately, our target is to create a perceptual embedding space that reflects animals' perception of sound. We create deep perceptual embeddings for bird sounds using triplet models. In order to deal with the challenging nature of triplet loss training with the lack of class-labelled data, we utilise multidimensional scaling (MDS) pretraining, attention pooling, and a triplet mining scheme. We also evaluate the advantage of triplet learning compared to learning a neural embedding from a model trained on MDS alone. Using computational proxies of similarity judgements, we demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioural judgements, helping us understand how animals perceive sounds.
Description
Funding Information: This research was supported by BBSRC research Grant. No. BB/R008736/1 “Machine Learning for Bird Song Learning.” Publisher Copyright: © 2021 Author(s).
Date
2021-07-01
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
Citation
Stowell, D 2021, 'Deep perceptual embeddings for unlabelled animal sound events', Journal of the Acoustical Society of America, vol. 150, no. 1, pp. 2-11. https://doi.org/10.1121/10.0005475
License
info:eu-repo/semantics/openAccess
Embedded videos