Joint analysis about length of stay, mortality and severity hospitalization of patients using Portuguese hospital morbidity database

Felipe Barletta

Giovani Silva (CEAUL-IST);
Nuno Sepúlveda (CEAUL-Warsaw University of Technology).

Tipo de bolsa
Bolsa de Doutoramento

Estado do projeto:
A decorrer


This research aims to study and apply spatio-temporal models to explain the geographic variation associated with the risk of death from the disease and its evolution over the years in the joint analysis of the length of stay, mortality, and severity of patients’ hospitalization using a Portuguese hospital morbidity database. In Portugal, a good source of information about hospitalizations is the hospital morbidity database (BDMH), available through a protocol between the Administração Central do Sistema de Saúde (ACSS) and the Faculty of Sciences of the University of Lisbon (FCUL). A relevant field of statistics is known as joint modeling, see for example (Rizopoulos, 2012), which has gained great importance in recent years in the area of biostatistics. Some of them study cancer progression (Lange et al., 2015) and also consider joint modeling with multi-state Markov submodel (Ferrer et al., 2016). In the latter case, for example, the motivation is to understand the progression of diseases in relation to risk factors that influence the evolution of the disease over time, combining information on how these risk factors act in the different disease transitions. An advantage of joint modeling is to allow the inclusion of several types of response variables. Generally, the analysis of these variables is carried out separately and this simultaneous appears more comprehensive, generating more parsimonious results.


This research plan, about the Joint analysis of hospitalization length of stay and mortality of patients using Portuguese hospital morbidity database, aims to achieve: 1. Jointly analyze discharge reason, length of stay, and the severity of hospitalized patients; 2. Produce maps with the relative risks of diseases in each study area. These maps will be based on predictions of the probability of death, length of hospital stay, and severity of diseases; 3. Produce a national outbreak detection system to accelerate hospitalization reduction in Portugal.

Síntese do Plano de Trabalho

Statistical methods for spatiotemporal data have developed considerably in the last two decades. This is mainly due to the advancement of Bayesian computational methods and accessibility to Geographic Information System/GIS and the new disease mapping methods (Silva and Dean, 2006). The spatio-temporal analysis considered here can be seen e.g. as an exploratory analysis to visually describe the spatial distribution of averages, rates, or proportions over a given region over time. In many statistical studies, whether experimental or observational, we are faced with problems in which one of the objectives is to study the relationship between variables, or more particularly, analyze the influence that one or more variables (explanatory), measured in individuals, have on a variable of interest which we call the response variable. The way, in general, the statistician approaches this problem is through the study of a regression model that relates this variable of interest with the explanatory variables (Amaral-Turkman and Silva, 2000). When there are several response variables, Multivariate Analysis is used to obtain results in which such variables are correlated. For example, Bonat and Jørgensen (2016), proposed a data analysis called multivariate covariance generalized linear models, designed to handle multivariate response variables. This analysis requires a marginal approach where the model directly specifies the vector of means and covariance matrix of the response variable. This provides a unified approach to a wide variety of types of response variables and covariance structures. On the other hand, other models can be used to measure the correlation between many outcomes by introducing random effects, e.g. spatial effects. In this context, such models are known as (mixed) joint models, and they have some advantages about, both estimation and interpretation related to separate or marginal models how justified by Henderson et al. (2000). For example, in the study of earthquakes, it is important to develop a joint distribution of counts (number of earthquakes) and severities (earthquake magnitude level) in space. Some initial references of joint models: Dunson (2003) provided an landmark publication that demonstrates the joint analysis of different responses: counting, categorical, and continuous results applied to sociological and psychological contexts, Henrring and Yang (2007) proposed similar approaches for dealing with a termination event and a longitudinal marker in a study of vaginal bleeding in pregnancy. However, joint models are more widely used with survival analysis and longitudinal analysis, e.g. repeated PSA measurements of surviving patients’ prostate cancer or not (Barletta, 2018). Based on these models, it is possible to jointly analyze the discharge reason, length of stay, and the severity of hospitalized patients, whose dependence is introduced into them through spatial random effects, with the analysis being processed using methods with Laplace approximations or Monte Carlo methods via chains of Markov (MCMC), implemented respectively in software INLA (Rue et al., 2009) and OpenBUGS (Lunn et al., 2000) or JAGS (2013), Plummer (2021). For more details, see a summary explanation of these methods by Paulino et al. (2018).

Resultados Esperados

Detecting areas and hospitals that are more susceptible to death risk, longer length of stay and identify risk factors. Consequently, It is expected the production of a national outbreak detection system to accelerate a reduction in hospitalization in Portugal. In addition, it will be implemented a computational routine for joint modeling when there is no longitudinal response variable.


Amaral-Turkman, M.A., Silva, G.L. Modelos Lineares Generalizados – da teoria à prática. Edições SPE, Sociedade Portuguesa de Estatı́stica, Peniche, 2000;

Barletta, F. Modelo conjunto para dados longitudinais e multiestado: Uma aplicação para câncer de próstata. Dissertação de Mestrado em Bioestatı́stica, Universidade Estadual de Maringá, 2018;

Bonat, W. H., Jørgensen, B. Multivariate covariance generalized linear models. Journal of the Royal Statistical Society Series C: Applied Statistics, 65(5), 649-675, 2016;

Chen, Z., Dunson, D. B. Random effects selection in linear mixed models. Biometrics 59.4, 762-769, 2003;

Cox, R. D., Regression models and life-tables (with discussion). Journal of the Royal Statistical Society B 34, 187-220, 1972;

Dunson, D.B. Dynamic latent trait models for multidimensional longitudinal data. Journal of the American Statistical Association, 98(463), 555-563, 2003;

Ferrer, L., Rondeau, V., Dignam, J., Pickles, T., Jacqmin-Gadda, H.; Proust-Lima, C. Joint modelling of longitudinal and multi-state processes: application to clinical progressions in prostate cancer. Statistics in Medicine, 35, 22, 3933–3948, 2016;

Henderson, R., Diggle, P., and Dobson, A. Joint modelling of longitudinal measurements and event time data. Biostatistics, 1(4), 465-480, 2000;

Herring, A. H., Yang, J. Bayesian modeling of multiple episode occurrence and severity with a terminating event. Biometrics 63.2, 381-388, 2007;

Krainski, E., et al. Advanced spatial modeling with stochastic partial differential equations using R and INLA. Chapman and Hall/CRC, 2018;

Martins, R., Silva, G.L., Andreozzi, V. Bayesian joint modeling of longitudinal and spatial survival AIDS data. Statistics in Medicine, 35, 3368-3384, 2016;

Paulino, C.D., Amaral-Turkman, M.A., Murteira, B., Silva, G.L., Estatı́stica Bayesiana, segunda edição. Fundação Calouste Gulbenkian, Lisboa, 2018;

Plummer, M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Version 3.4.0., 2021;

R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria. Disponı́vel em: <>, 2021;

Rizopoulos, D. Joint models for longitudinal and time-to-event data: With applications in R. CRC Press, 2012.

Rue, H., Martino, S., Chopin, N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society 71, 319–392, 2009;

Silva, G.L., Dean, C.B., Niyonsenga, T., Vanasse, A. Hierarchical Bayesian spatiotemporal analysis of revascularization odds using smoothing splines. Statistics in Medicine 27, 2381- 2401, 2008;

Silva, G.L., Dean, C.B.. Uma Introdução à Análise de Modelos Espaço temporais para Taxas, Proporções e Processos de Multi-estados. Edições ABE, Associação Brasileira de Estatı́stica, Caxambú, 2006;

Wang, Q., Dinse, G.E. Linear regression analysis of survival data with missing censoring indicators. Lifetime Data Analysis 17, 256-279, 2011.