Half-day Workshop on Biostatistics (1) Bayesian nonparametric ROC regression modeling / (2) Extending induced ROC methodology to the functional context / (3) Detection of copy number variations using whole genome sequence data

Miguel de Carvalho – École Polytechnique Fédérale Lausanne, Switzerland (1) / Vanda Inácio – CEAUL and Departamento de Estatística Investigação Operacional, FCUL (2) / Nuno Sepúlveda – CEAUL and London School of Hygiene and Tropical Medicine, UK (3)
FCUL (DEIO) – Campo Grande – Bloco C6 Piso 4 Sala: 6.4.30
Sexta-feira, 6 de Janeiro de 2012
Referência Projeto: PEst-OE/MAT/UI0006/2011

CEAUL

Half-day Workshop on Biostatistics

6 January 2012

Room: 6.4.30 Faculdade de Ciências, C6, 4th floor

(free entrance but registration is required till 29 December 2011)

14.30-15.15

Bayesian nonparametric ROC regression modeling

Miguel de Carvalho (miguel.carvalho@epfl.ch)

EPFL – École Polytechnique Fédérale de Lausanne, Switzerland

(Joint work with Vanda Inácio, Alejandro Jara and Timothy Hanson)

The receiver operating characteristic (ROC) curve is the most widely used measure for evaluating the discriminatory performance of a continuous marker. It is well known that, in certain circumstances, the marker’s discriminatory capacity is aﬀected by other factors, and several ROC regression methodologies have been proposed to incorporate covariates in the ROC framework. Most of these methodologies assume that covariate eﬀects have parametric forms, but this can lead to misleading conclusions if the model is misspeciﬁed. To overcome this drawback we propose a Bayesian nonparametric estimator of the conditional ROC curve based on dependent Dirichlet processes. A simulation study is performed to assess the performance of the proposed estimator. Methods are applied to real data concerning diagnosis of diabetes.

15.15-16.00

Extending induced ROC methodology to the functional context

Vanda Inácio (vanda.kinets@gmail.com)

CEAUL and Departamento de Estatística e Investigação Operacional, FCUL

(Joint work with Wenceslao Gonzalez, Manuel Febrero, Francisco Gude, Todd Alonzo and Carmen Cadarso)

The receiver operating characteristic (ROC) curve is the most widely used measure for evaluating the discriminatory performance of a continuous marker. It is well known that, in certain circumstances, the marker’s discriminatory capacity is aﬀected by factors, and several ROC regression methodologies have been proposed to incorporate covariates in the ROC framework. Until now, these methodologies are only developed for the case where the covariate is univariate or multivariate. We extend ROC regression methodology for the case where the covariate is functional, rather than univariate or multivariate. To this end, semiparametric and nonparametric induced ROC regression estimators are proposed. A simulation study is performed to assess the performance of the proposed estimators. Methods are applied to and motivated by a metabolic syndrome study in Galicia (NW Spain).

16.00-16.30 Coffee-break

16.30-17.15

Detection of copy number variations using whole genome sequence data

Nuno Sepúlveda (nuno.sepulveda@lshtm.ac.uk)

CEAUL and London School of Hygiene and Tropical Medicine, United Kingdom

Recent years have witnessed a great effort to detect and catalogue copy number variations (CNVs) on the genome of different organisms, as they may explain natural phenotypic variation important for human health. In this research quest, the so-called next generation sequencing technologies have brought new opportunities to study genomes more accurately and at a larger scale. In a single run of sequencing, these technologies are nowadays able to generate millions of read pairs from a target genome. The resulting data can be analyzed under the concepts of coverage or mapping distance. Coverage refers to the number of read pairs mapped onto a specific position (or region) of a reference genome. Mapping distance is the mapped distance between the reads of a pair. In theory, genomic regions with either low coverage or high mapping distance are indicative of deletions. Conversely, regions with either high coverage or low mapping distance may reflect amplifications. This talk reviews the basic steps involved in data pre-processing and discusses two new complementary approaches to detect signals for CNVs from coverage or mapping distance data. Coverage is analyzed through Bayesian methods under the aegis of two flexible probability distributions for the data, Poisson-Gamma and Poisson-Lognormal. Mapping distance is in turn analyzed through a Bootstrap method to detect genes that show overall extreme mapping distance. The central idea of the method is to compare, through statistical testing, the sample of read pairs mapped onto a given gene against samples of read pairs taken randomly from the genome. Both methodologies are illustrated with several whole genome sequence data sets of clinical and laboratory Plasmodium falciparum parasites.