Statistical methods for compositional data analysis-清华大学求真书院

学术报告

首页 > 书院学术 > 至美数学 > 学术报告

Statistical methods for compositional data analysis

来源： 03-01

时间：Friday, 14:00 March 1st, 2024

地点：Lecture Hall B725报告厅, Tsinghua University Shuangqing Complex Building A 清华大学双清综合楼A座B725报告厅Zoom:4552601552,YMSC

主讲人：Xiang Zhan 占翔 Peking University

Xiang Zhan

Peking University

Xiang Zhan is an Associate Professor at the Department of Biostatistics and Beijing International Center for Mathematical Research of Peking University. He obtained his BS degree from Peking University in 2010 and PhD degree from Penn State in 2015. Before joining Peking University, Xiang had been working at Penn State as an Assistant Professor of Biostatistics. His research interest includes biostatistics, high dimensional statistics, compositional data analysis, kernel methods and next generation sequencing data analysis.

Abstract

It is quite common to encounter compositional data in many disciplines in modern data sciences (e.g., sequence count data in biological and biomedical research). Unfortunately, traditional statistical methods without addressing compositionality can lead to suboptimal or even misleading analysis results.

In this talk, we first discuss measurement error issues in compositional data. The presence of covariate measurement errors poses grand challenges for existing statistical error-in-variable regression analysis methods since measurement error in one component has an impact on others in the composition. To simultaneously address the compositional nature and measurement errors in the high dimensional compositional covariates, we propose a new method named ERror-In-Composition (Eric) Lasso for regression analysis of corrupted compositional predictors. Estimation error bounds of Eric Lasso and its asymptotic sign consistent selection properties are established.

The second part of this talk is about composition-on-composition regression. When both responses and predictors are compositional, the inventory of statistical analysis tools is surprisingly limited. To fill this gap, we propose a high-dimensional Composition-On-Composition (COC) regression analysis, which does not require log-ratio transformations and hence can handle excessive zeroes in sequence count data. We first introduce a penalized estimation equation approach in COC to improve its estimation accuracy in high-dimensional settings and then establish inference procedures to quantify uncertainties in COC model estimation and prediction. The proposed methods are evaluated using both numerical simulations and real data applications to demonstrate its validity and superiority.

返回顶部

Instruction for choosing courses in the direction Algebra and Number
Please download the file for more informatio
View more
Statistical Topics with Missing Data
Abstract：In some sense, many issues in statistics can be viewed as being focused on issues involving missing data, from predicting future observations from past observations, to the design and analysis of surveys and experiments, to the understanding of economic models involving instrumental variables, to medical data that are unobservable due to the death of patients. This course will conside...
View more

书院学术

Statistical methods for compositional data analysis

Instruction for choosing courses in the direction Algebra and Number

Statistical Topics with Missing Data

友情链接 HYPERLINK：