统计与数据科学论坛-国家天元数学西北中心

前沿论坛与短期课程

当前位置: 首页 > 前沿论坛与短期课程 > 正文

统计与数据科学论坛

日期：2019-04-23 点击：

为了促进统计与数据科学前沿问题的研究，互通研究动态，加强学术交流，国家天元数学西北中心“统计与数据科学论坛”将于4月27日—28日在西安召开。该论坛为统计与数据科学领域的研究人员提供一个交流与合作的平台，供参会人员分享该领域的研究成果及创新思想。

一、学术委员会

主席：徐宗本（院士西安交通大学）

成员：巩馥洲（教授博士生导师西北大学）

郭真华（教授博士生导师西北大学）

张瑞（教授博士生导师西北大学）

魏玲（教授博士生导师西北大学）

张海（教授博士生导师西北大学）

夏志明（教授博士生导师西北大学）

二、组织委员会

负责人：夏志明（教授博士生导师西北大学）

成员：孟德宇（教授博士生导师西安交通大学）

夏志明（教授博士生导师西北大学）

王惠亚（博士西北大学）

王丹（博士西北大学）

强喆（博士西北大学）

勾廷勋（博士西北大学）

任睿思（博士西北大学）

组委会联系邮箱：statxzm@nwu.edu.cn

组委会联系电话： 18182632263（高老师），13689293719 (夏老师)

三、会议日程表

2019年04月27日，星期六，长征国际酒店三楼多功能厅

时间	内容		主持人
8：30-8：45	开幕式		夏志明
集体合影(8:45-9:00，酒店一层正门口)
时间	报告人	题目	主持人
9:00-9:30	朱利平	Robust Nonparametric Two-Sample Tests For Equality of Distributions in High Dimensions	张伟平
9:30-10:00	冯兴东	Copula-based Quantile Regression for longitudinal data
10:00-10:30	秦国友	Multiply Robust Subgroup Identification for Longitudinal Data with Dropouts
茶歇(10:30-10:45, 会议厅外侧)
10:45-11:15	张新雨	Parsimonious Model Averaging with a Diverging Number of Parameters	张瑞
11:15-11:45	刘玉坤	Retrospective score tests versus prospective score tests for genetic association with case-control data
11:45-12:15	张伟平	A Robust Joint Modeling Approach for Longitudinal Data with Informative Dropouts
午餐(12:15-14:00, 长征国际酒店)
14:00-14:30	孟德宇	Progressive Gradient Correcting by Meta-learner on Corrupted Labels	张海
14:30-15:00	吴远山	An ADMM Algorithm for Distributed Sparse Optimal Scoring Classification
15:00-15:30	薛江	Research on Intelligent Wireless Communication
15:30-16:00	陈占寿	Small area quantile estimation via nonparametric mixed model with DRM errors
茶歇(16:00-16:15, 会议厅外侧)
16:15-16:45	陈夏	Moderate deviation principle for hypothesis test of high-dimensional covariance matrices	魏玲
16:45-17:15	项冬冬	Signal classification for the integrative analysis of multiple sequences of multiple tests
17:15-17:45	秦瑞兵	Rank test for change in persistence
晚宴(18:30- ，长征国际酒店)

2019年04月28日，星期日，长征国际酒店

8:30-11:30	自由研讨
午餐(12:00-14:00, 长征国际酒店)

报告内容简介

1、朱利平，中国人民大学

Title：Robust Nonparametric Two-Sample Tests for Equality of Distributions in High Dimensions

Abstract：We propose a robust nonparametric two-sample test, which generalizes the Cram\'er-von Mises test through projections, to test for equality of two distributions in high dimensions. The population version of our proposed generalized Cram\'er-von Mises statistic is nonnegative and equals zero if and only if the two distributions are identical, ensuring that our proposed test is consistent against all fixed alternatives. In addition, our proposed test statistic has an explicit form and is completely free of tuning parameters. It requires no moment conditions and hence is robust to the presence of outliers and heavy-tail observations. We study the asymptotic behaviors of our proposed test under both the ``large sample size, fixed dimension" and the ``fixed sample size, large dimension" paradigms. In the former paradigm, we show that the asymptotic power of our proposed test does not depend on the size ratio of the two random samples. This ensures that our proposed test can be readily applied to imbalanced samples. In the latter paradigm, we observe that, surprisingly, the two distributions are equal if and only if their first two moments are equal. Therefore, we suggest to tailor our proposed test to detect location shifts and scale differences, which further enhances the power performance of our proposed test significantly. Numerical studies confirm that our proposals are superior to many existing tests in high dimensional two-sample test problems.

2、冯兴东，上海财经大学：

Title:Copula-based Quantile Regression for longitudinal data

Abstract: Inference and prediction in quantile regression for longitudinal data are challenging without parametric distributional assumptions. We propose a new semiparametric approach that uses copula to account for intra-subject dependence and approximates the marginal distributions of longitudinal measurements, given covariates, through regression ofquantiles. The proposed method is flexible, and it can provide not onlyefficient estimation of quantile regression coefficients but also prediction intervals for a new subject given the prior measurements and covariates. The properties of the proposed estimator and prediction are established theoretically, and assessed numerically through a simulation study and the analysis of a nursing home data.

3、秦国友，复旦大学：

Title: Multiply Robust Subgroup Identification forLongitudinal Data with Dropouts

Abstract: Subgroup identification is an important tool in many applications,such as personalised medicine. In medical research, longitudinaldata with dropouts often arises. However there is littlework in subgroup analysis considering this data type. We proposea concave fusion penalization method based on weighted median regressionto identify subgroups in longitudinal data with dropouts.In order to deal with missingness, we introduce a multiply robustweight which allows multiple models for missing process. Whenany of the models is correctly specified, the proposed estimator hasoracle property. In addition, the proposed method is robust againstoutliers which results from median regression. Furthermore, we developa computing algorithm and propose a modified Bayesian informationcriterion to select penalization parameter, then establish itsselection consistency. The numerical performance is illustrated insimulation studies.In the end, a real data is analyzed using the proposed method.

4、张新雨，中科院系统所/中科院预测中心：

Title: ParsimoniousModel Averaging with a Diverging Number ofParameters

Abstract: Model averaging generally provides better predictions than model selection, but theexisting model averaging methods cannot lead to parsimonious models. Parsimonyis an especially important property when the number of parameters is large. Toachieve a parsimonious model averaging coefficient estimator, we suggest a novelcriterion for choosing weights. Asymptotic properties are derived in two practical scenarios: (i) one or more correct models in candidate model set; and (ii) all candidatemodel misspecified. Under the former scenario, it is proved that our methodcan put the weight one to the smallest correct model and the resulting model averagingestimators of coefficients have many zeros and thus lead to a parsimoniousmodel. The asymptotic distribution of the estimators is also provided. Under thelatter scenario, prediction is mainly focused on and we prove that the proposed procedureis asymptotically optimal in the sense that its squared prediction loss and riskare asymptotically identical to those of the best – but infeasible – model averagingestimator. Numerical analysis shows the promise of the proposed procedure overexisting model averaging and selection methods.

5、刘玉坤，华东师范大学：

Title: Retrospective score tests versus prospective score tests for genetic association with case-control data

Abstract: Since the seminal work by Prentice and Pyke (1979), the prospective logistic likelihood has become the standard method of analysis for retrospectively collected case-control data, in particular for testing the association between a single genetic marker and a disease outcome in genetic case-control studies. When studying multiple genetic markers with relatively small effects, especially those with rare variants, various aggregated approaches based on the same prospective likelihood have been developed to integrate subtle association evidence among all considered markers. In this paper we show that using the score statistic derived from a prospective likelihood is not optimal in the analysis of retrospectively sampled genetic data.We develop the locally most powerful genetic aggregation test derived through the retrospective likelihood under a random effect model assumption. In contrast to the fact that the disease prevalence information cannot be used to improve the efficiency for the estimation of odds ratio parameters in logistic regression models, we show that it can be utilized to enhance the testing power in genetic association studies. Extensive simulations demonstrate the advantages of the proposed method over the existing ones. One real genome-wide association study is analyzed for illustration.

6、张伟平，中国科学技术大学：

Title: A Robust Joint Modeling Approach for Longitudinal Data with Informative Dropouts

Abstract: This article proposes a robust method for analysing longitudinal continuous responses with informative dropouts and potential outliers by using the multivariate t distribution. Unlike thexisting approaches which mainly focus on the inference of regression mean and dropouts process, our approach aims to reveal the dynamics in the location function, marginal scale function and association by joint parsimoniously modeling the location and dependence structure. A parametric fractional imputation algorithm is developed to speed up the computation associated with the EM algorithm for maximum likelihood estimation with missing data. The resulting estimators are shown to be consistent and asymptotic normality distributed. Data examples and simulations demonstrate the effectiveness of the proposed approach.

7、孟德宇，西安交通大学

Title: Progressive Gradient Correcting by Meta-learner on Corrupted Labels

Abstract:在现实复杂环境下，用以训练的数据标记通常包含大量噪声（错误标记）。采用数据加权的方式是对该噪声标记问题一种通用的方法，例如侧重于易分类样本的自步学习方法与侧重于难分类样本的boosting算法等。然后，目前对数据加权仍然缺乏统一的学习模式，且一般总要涉及超参数选择的问题。本报告将汇报一种新的元学习方法，通过在无偏差元数据的引导下，能够对存在偏差的噪声标记数据的训练模式进行有效的调节与控制，从而在很大程度上避免了超参数调节的问题，并通过数据驱动的方式实现了自适应选择权重赋予的方式。通过在噪声标记数据集上的测试，初步验证了该方法的有效性与稳定性。

8、吴远山，中南财经政法大学

Title：An ADMM Algorithm for Distributed Sparse Optimal Scoring Classification

Abstract：Massive dataset featuring with huge sample size and large dimension are typical stored across on multiple machines, which poses increasing challenges for traditional statistical methodologies wherein dataset are required to be easily accessible and can be analyzed in a single machine. Focusing on the sparse classification problem, the linear discriminant analysis boils down to solving a sequence elastic net penalty problem by the sparse optimal scoring technique. Due to the communication complexity, scalability and cost, it is infeasible to achieve a global solutions for massive dataset stored in a distributed manner. Reformulating the problem into a separable form, we develop a novel distributed algorithm based on the alternating direction method of multipliers by introducing consensus-based constraint.

Vectors are exchanged within machines and thus the communications are at affordable level. We show that the solution of proposed algorithm converges to the global one at the linear rate in terms of iterations as well as maintains the comparable classification accuracy. The merits of our method are further demonstrated through simulation studies and a real example.

9、薛江，西安交通大学

Title：Research on Intelligent Wireless Communication

Abstract：In this talk, I will share the research experiences and results about intelligent wireless communication, which are based on the works done recently. Intelligent wireless communication is the future technique after 5G system, and it could make the wireless communication system more adaptive and intelligent by using machine learning methods. Meanwhile, this talk will show you how it works for preamble detection in IoT, signal detection for MIMO system, adaptive CSI estimation by noise modelling, etc.. it will also show great improvement can be achieved by machine learning in new are.

10、陈占寿，青海师范大学

Title：Small area quantile estimation via nonparametric mixed model with DRM errors

Abstract：This paper studies small area quantile estimation under a unit level non-parametric mixed regression model with error distributions satisfy a semi-parametric density ratio model. We fit the non-parametric model via penalized spline regression method. Empirical likelihood is then applied to estimate the parameters in the density ratio model based on the residuals. This leads to natural area-specific estimates of error distributions. A kernel method is then applied to obtain smoothed error distribution estimates. These estimates are then used for quantile estimation under two situations: one is where we only have knowledge of covariates power means of at the population level, the other is where we have covariates values of all sample units in the population. Simulation experiments indicate that the proposed method for small area quantiles estimation works well for quantiles around the median in the first situation, and for a broad range of the quantiles in the second situation. A bootstrap method for estimating the mean square errors of the proposed estimator is also investigated. An empirical example based on Canadian income data is included.

11、陈夏，陕西师范大学

Title：Moderate deviation principle for hypothesis test of high-dimensional covariance matrices

Abstract：This paper is concerned with the moderate deviation principle of test for sphericity and identity of high-dimensional covariance matrices. They can accommodate situations where the data dimension is much larger than the sample size, namely the “large p, small n” situation. It is noted that the results are established without the normality assumption and without specifying an explicit relationship between p and n. Numerical simulations are conducted to demonstrate the theoretical results.

12、项冬冬，华东师范大学

Title：Signal classification for the integrative analysis of multiple sequences of multiple tests

Abstract：The integrative analysis of multiple datasets is becoming increasingly important in many fields of research. When the same features are studied in several independent experiments, a common integrative approach is to jointly analyze the multiple sequences of multiple tests that result. It is frequently necessary to classify each feature into one of several categories, depending on the null and non-null configuration of its corresponding test statistics. This paper studies this signal classification problem, motivated by a range of applications in large-scale genomics. Two new types of misclassification rates are introduced, and both oracle and data-driven procedures are developed to control each of these types while also achieving the largest expected number of correct classifications. The proposed data-driven procedures are proved to be asymptotically valid and optimal under mild conditions, and are shown in numerical experiments to be nearly as powerful as oracle procedures, with substantial gains in power over their competitors in many settings. In an application to psychiatric genetics, the proposed procedures are used to discover genetic variants that may affect both bipolar disorder and schizophrenia, as well as variants that may help distinguish between these conditions.

13、秦瑞兵，山西大学

Title：Rank test for change in persistence

Abstract：This paper proposes a test to detect the change in persistence on the basis of ranks of a sequence, then investigates asymptotic properties under the null hypothesis and the alternative hypothesis. The Monte-Carlo simulations demonstrate that the proposed test has less powers but more correct sizes in finite samples, comparing with the test proposed by Kim (2000), that means the test has much lower rejection rate when the series has no change in persistence. As an illustration, We apply our test to the series of the monthly CPI rate and ISM non-manufacturing index of the America.

上一条：国家天元数学西北中心访问合作研究小组项目“深度强化学习与动态随机优化研究小组” 下一条：2019生物数学最新进展学术研讨会