清华主页 EN
导航菜单

Statistical methods for compositional data analysis

来源: 03-01

时间:Friday, 14:00 March 1st, 2024

地点:Lecture Hall B725报告厅, Tsinghua University Shuangqing Complex Building A 清华大学双清综合楼A座B725报告厅Zoom:4552601552,YMSC

主讲人:Xiang Zhan 占翔 Peking University

Xiang Zhan 

Peking University

Xiang Zhan is an Associate Professor at the Department of Biostatistics and Beijing International Center for Mathematical Research of Peking University. He obtained his BS degree from Peking University in 2010 and PhD degree from Penn State in 2015. Before joining Peking University, Xiang had been working at Penn State as an Assistant Professor of Biostatistics. His research interest includes biostatistics, high dimensional statistics, compositional data analysis, kernel methods and next generation sequencing data analysis.


Abstract

It is quite common to encounter compositional data in many disciplines in modern data sciences (e.g., sequence count data in biological and biomedical research). Unfortunately, traditional statistical methods without addressing compositionality can lead to suboptimal or even misleading analysis results.

In this talk, we first discuss measurement error issues in compositional data. The presence of covariate measurement errors poses grand challenges for existing statistical error-in-variable regression analysis methods since measurement error in one component has an impact on others in the composition. To simultaneously address the compositional nature and measurement errors in the high dimensional compositional covariates, we propose a new method named ERror-In-Composition (Eric) Lasso for regression analysis of corrupted compositional predictors. Estimation error bounds of Eric Lasso and its asymptotic sign consistent selection properties are established.

The second part of this talk is about composition-on-composition regression. When both responses and predictors are compositional, the inventory of statistical analysis tools is surprisingly limited. To fill this gap, we propose a high-dimensional Composition-On-Composition (COC) regression analysis, which does not require log-ratio transformations and hence can handle excessive zeroes in sequence count data. We first introduce a penalized estimation equation approach in COC to improve its estimation accuracy in high-dimensional settings and then establish inference procedures to quantify uncertainties in COC model estimation and prediction. The proposed methods are evaluated using both numerical simulations and real data applications to demonstrate its validity and superiority.


返回顶部
相关文章