Developing machine learning tools for automatic detection and classification of wildlife sounds


Carolina Marques


Tiago A. Marques                                                                                                (Centro de Estatística e Aplicações, Departamento de
Biologia, Faculdade de Ciências, Universidade de Lisboa e Centre for Research into Ecological and Environmental Modelling, University of St Andrews);

Emmanuel Dufourq
(Stellenbosch University, Stellenbosch, South Africa e African Institute for Mathematical Sciences, Muizenberg, South Africa) ;

Carl Donovan                                                                                              (Centre for Research into Ecological and Environmental Modelling, School of Mathematics and Statistics, University of St.
Andrews, St. Andrews, Scotland, UK)

Tipo de bolsa

Bolsa de Doutoramento

Estado do projeto:

A decorrer


The increasing pressure that humans place on the environment means that efficient ways to monitor wildlife are required for effective management and conservation. In the last couple of decades there has been an explosion of new techniques that generate large quantities of data, and methods that allow researchers to process and analyze those data are in high demand. New data collection methods include a variety of options, ranging from eDNA, drones, satellites, animal borne tags, camera traps and acoustic sensors, all of which generate large amounts of data, often in real time. While initially dealing with most of this data involved a lot of human effort, more and more researchers are trying to find ways to diminish the amount of human labor involved (e.g. Webber et al. 2022), and machine learning techniques, in particular deep learning approaches, are becoming widespread in ecology (see e.g. Christin, et al. 2019 and Borowiec et al. 2022).
Passive acoustic monitoring has developed into a fully fledged approach to monitoring wildlife, where sensors can be left out in the field recording continuously for long periods of time. Sound recordings can be made either in the air (microphones) or in water (hydrophones), and it is not that hard to generate thousands of hours of recordings in a single project, creating terabytes of data. The days where humans could
process the data manually by listening to the recordings is essentially over. Many types of automatic detectors have been developed and tested. Recent interest on convolutional neural networks (CNN) for automatic detection and classification of features in images opened the door to processing sounds as images. Sounds are traditionally seen as images (spectrograms are two dimensional representations of intensity as a function of time and frequency), and hence it is not surprising that
methods optimized for image analysis are being considered for sound processing.
Recent examples include the work of Dufourq et al. (2021) for gibbons, Sanches et al. (2021) for birds or Ziegenhorn et al. (2022) for cetaceans.
The recent review of convolutional neural networks for bioacoustics put forward by Stowell (2021) makes it clear that this will be a hot topic in the next few years at the interface of ecology, statistics and informatics. We are starting from the position where the focus might be the estimation of animal abundance, a field where the supervising team has ample expertise, but we anticipate that other applications might arise during the development of the planned research program.
This PhD plan is strategic in that it will allow CEAUL and DEIO to be at the forefront of a blooming field, not only via the work of the student, but also via fostering the collaboration between CEAUL and the research centers of the two non-Portuguese cosupervisors, namely the Center for Research into Environmental and Ecological Modelling and the Data Science for Eco-Systems Research Group.


  • Develop new methods for sound detection and classification, applying existing methods to different cases/data, and identifying opportunities to develop new tools to deal with specific challenges related to bioacoustic data.
  • Promote the use of automated methodologies for automatic sound detection and classification by translating methods developed into easy to use tools for practitioners.

Síntese do Plano de Trabalho

The first year of the PhD is envisaged to be a training phase, when the student is expected to get up to speed in sound processing and machine learning techniques.
During the first year the student will also be exposed to a number of potential research problems motivated by applications of bioacoustics, hence allowing the identification of relevant challenges. During this first year we envisage the student will visit both supervisors not based in Lisboa, to learn on site from them the required aspects of machine learning that might be useful for the PhD.
From these challenges a number of case studies will be selected to be worked on in the remaining years of the program. These will come from a variety of datasets that the supervisors have access to from their own research, but also by screening over a wide international network of collaborators available to the supervising team, researchers that have the problem, that is, large amounts of ecological acoustic data, but not the expertise required to deal with it.
At the onset the student will leverage on the datasets already available to work with, namely from animal bourne tags collected during the ACCURATE projects (, led by supervisor TAM). We have access to data from a variety of marine mammal species. namely narwhal sound data and bird sound
data. Narwhals: we have sound files from animal borne tags (Accusonde) and passive acoustic monitoring records from soundtraps. We plan starting with the Accusonde data, which should provide a reasonably fast way to identify the sounds of interest from the tag data, annotate sounds to build a training dataset, and then use it to detect
sounds on the tags. Assuming that is successful, we will use the detector created to run over the more challenging soundtrap data. These data have not yet been processed for sounds but were deployed in an area of high narwhal density. Additionally, we will be using bird data collected by researcher Ana Isabel Leal (AIL), from CE3C. AIL has a
large database of bird sound recordings collected in the last two years in a woodland area (PORBIOTA project). This data will be used to create different training datasets of individual syllables/phrases (starting with species that have clear and easy to identify sounds). Testing of the training datasets will be done using the database recordings.
It is well known that automatic detectors are generally sensitive to the training datasets used, and minor changes between training and testing datasets might lead to poor performance. We are hoping to explore how we might be able to develop neural networks capable of being used in an omnibus way for other applications beyond the ones they have been developed via transfer learning: the ability to use a network
architecture for a given general task which then receives a different front end depending on the task at hand. Beside transfer learning, models’ accuracy can be improved using data augmentation techniques. The idea consists in artificially increasing the amount of data by using shift-based augmentation and mixing-based augmentation. We expect that larger datasets will lead to more robust models. Another complementary approach is to leverage the large amount of unlabeled data. Pre-trained models may be used to 1) classify data and 2) assess a confidence level of the classification. Based on this confidence level, a relevance-feedback approach can be used, taking advantage of human experts in an optimized way. Multiple species might sound similar (and appear to have similar signatures on the spectrogram), but actuallymight be different species and the only way to discern this would be via their location.
Some marine mammal species might only produce sounds at depth. A novel investigation will explore the ability to include context meta-data into the neural network, such as geographical location, in 3 dimensions if available, or time between successive sounds, as a means of enabling the model to learn the relationships between vocalization events and location or frequency of sound production, as a means to further improve classification accuracy.

Resultados Esperados

We expect the student to be able to develop tools that will allow the detection of sounds produced by a range of animals, initially including marine mammals and birds, from both animal borne tags and passive acoustic recordings. We are also hoping to be able to create tools that are user friendly, promoting the use of said tools by other researchers in the field. We anticipate the publication of at least 5 research papers,
including two about marine mammal species and one about birds, one regarding transfer learning and one about including context in the detection and classification process.


Borowiec, M. L.; Dikow, R. B. et al. 2022 Deep learning as a tool for ecology and evolution Methods in Ecology and Evolution. DOI: 10.1111/2041-210x.13901
Christin, S.; Hervet, É. & Lecomte, N. 2019. Applications for deep learning in ecology Methods in Ecology and Evolution 10:1632-1644
Dufourq, E.; Durbach, I. et. al. 2021 Automated detection of Hainan gibbon calls for passive acoustic monitoring Remote Sensing in Ecology and Conservation 7: 475-487
Webber, T.; Gillespie, D.; Lewis, T.; Gordon, J.; Ruchirabha, T. & Thompson, K. 2022 Streamlining analysis methods for large acoustic surveys using automatic detectors with operator validation Methods in Ecology and Evolution. DOI: 10.1111/2041-
Marques, T. A.; Thomas, L.; et al. 2013 Estimating animal population density using passive acoustics Biological Reviews 88: 287-309Sanchez, F. J. B.; Hossain, M. R.; English, N. B. & Moore, S. T. 2021 Bioacoustic classification of avian calls from raw sound waveforms with an open-source deep learning architecture Scientific Reports 11: 15733
Stowell, D. 2021 Computational bioacoustics with deep learning: a review and roadmap. arXiv:2112.06725
Ziegenhorn, M. A.; Frasier, K. E.; Hildebrand, J. A.; Oleson, E. M.; Baird, R. W.;Wiggins, S. M. & Baumann-Pickering, S.2022 Discriminating and classifying odontocete echolocation clicks in the Hawaiian Islands using machine learning methods PLOS ONE, 17: e0266424