Air Quality Data Analysis with Symbolic Principal Components

Catarina Padrela Loureiro

Departamento de Matemática, Instituto Superior Técnico


7 fevereiro 2024 (Wednesday) – 14:30


Air pollution is a global challenge with deep implications for public health and the environment. We examine air quality data from a monitoring station in Entrecampos, Lisbon, using Symbolic Data Analysis. The dataset consists of hourly concentrations of nine pollutants over three years, which are logarithmically transformed and aggregated in intervals, taking the daily minimum and maximum values. The symbolic mean and variance are estimated for each variable through the method of moments, and the pairwise dependencies are captured using a bivariate copula. Symbolic principal component scores are obtained from the estimated covariance matrix and used to fit generalized extreme value distributions. Control charts, based on these distributions’ quantiles, are used to identify outlying observations. A comparative analysis with daily average-based outlier detection methods is conducted. The results show the relevance of Symbolic Data Analysis in revealing new insights into air quality

Short bio:

Catarina Loureiro is a PhD student in the Doctoral Program in Statistics and Stochastic Processes at Instituto Superior Técnico since October 2022. She is also a research fellow from the Fundação para a Ciência e a Tecnologia (FCT). Previously, she was a Data Analyst at Fidelidade Companhia de Seguros, from 2020 to 2022. Furthermore, she completed her Master’s in Mathematics and Applications in 2019, and her Bachelor’s in Applied Mathematics and Computation in 2017 from Instituto Superior Técnico.