专题研讨会日程安排
时间:2019年4月11日-14日 上午8:30-11:30 下午14:00-17:30
地点:西安天域凯莱大饭店
活动召集人:西安交通大学肖燕妮教授
时间 |
主题 |
演讲专家 |
4月12日 8:30 – 11:30 |
基于动力学理论的时间序列预测方法 |
主讲人:陈洛南 介绍人:陈川等 |
4月12日 14:00 – 17:30 |
基于动力学理论的健康临界状态量化方法 |
主讲人:刘锐 介绍人:刘小平、王勇、陈洛南等 |
4月13日 8:30 – 11:30 |
数据驱动的因果关系推断方法 |
主讲人:林伟 介绍人:马欢飞、李铁军、王勇等 |
4月13日 14:00 – 17:30 |
单细胞数据分析理论和方法 |
主讲人:李铁军、王勇 介绍人:万林,马亮,戴昊、曾涛等 |
4月14日 8:30 – 11:30 |
动力学及生物数学 |
主讲人:肖燕妮; 介绍人:林伟、唐三一等 |
专题研讨会报告介绍
(按报告先后排序)
Topic1: Predicting future dynamics based on short-term high-dimension data.
Speaker: Luonan Chen Chinese Academy of Sciences
Abstract: Future state prediction for nonlinear dynamical systems is a challenging task, particularly when only a few time series samples for high-dimensional variables are available from real-world systems. In this work, we propose a model-free framework, named randomly distributed embedding (RDE), to achieve accurate future state prediction based on short-term high-dimensional data. Specifically, from the observed data of high-dimensional variables, the RDE framework randomly generates a sufficient number of low-dimensional “nondelay embeddings” and maps each of them to a “delay embedding,” which is constructed from the data of a to be predicted target variable. Any of these mappings can perform as a low-dimensional weak predictor for future state prediction, and all of such mappings generate a distribution of predicted future states. This distribution actually patches all pieces of association information from various embeddings unbiasedly or biasedly into the whole dynamics of the target variable, which after operated by appropriate estimation strategies, creates a stronger predictor for achieving prediction in a more reliable and robust form. Through applying the RDE framework to data from both representative models and real-world systems, we reveal that a high-dimension feature is no longer an obstacle but a source of information crucial to accurate prediction for short-term data, even under noise deterioration.
References: Huanfei Ma, Siyang Leng, Kazuyuki Aihara, Wei Lin, Luonan Chen. Randomly Distributed Embedding Making Short-term High-dimensional Data Predictable. Proc Natl Acad Sci USA, 115 (43) E9994-E10002, https://doi.org/10.1073/pnas.1802987115, 2018.
报告题目2:Short-term High-dimensional Time Series Prediction by Anticipating Learning Framework
报告人:陈川 中山大学
报告摘要:Future values prediction in time series has had significant practical applications across different disciplines. In particular, making predictions with only a few time series samples for high-dimensional variables remains to be a challenge. In this work, we proposed a model-free framework, namely Anticipating Learning (AL), to achieve precise future state prediction based on short-term high-dimensional data. Intuitively, based on the embedding theory, the AL framework randomly generates a sufficient number of low-dimensional sampled embeddings, and the mappings between each sampled and the delay embedding of the target variable are uniformly reconstructed, in which both temporal information and spatial information (e.g., associations or interactions among the high dimensional variables) are sufficiently exploited. In particular, the AL framework takes each sampled attractor as a set of sampled neurons with the Dropout trick and co-train different delay predictors in a unified neural network, which creates a stronger predictor for achieving prediction more robustly and reliably. Extensive experiments on the short-term high-dimensional data produced by both representative synthetic and real-world systems demonstrated that the high dimension property is a source of information instead of an obstacle to the accurate prediction for short-term data.
报告题目3:探测复杂疾病的临界突变信号
报告人:刘锐 华南理工大学
报告摘要:具有突然恶化现象的复杂疾病具有一个很相似的特点,即在病情恶化过程中存在一个突变点,在离突变点较远时,病情不明显;而到达突变点时,病情在很短的时间内从稳定期突然恶化而成为重病期。利用病患的高通量生物分子数据,找到标识疾病早期的标记物,对疾病的早期诊断具有重要意义。
围绕疾病临界点预警这一课题,我们提出了动态网络标记物的理论,并针对不同的适用情况开发了一系列的计算方法,该理论及后续算法在许多实际数据上得到了成功的应用。本次报告中,基于动态网络标记物,我们将介绍在高通量生物分子数据的样本条件下的一些临界点预警算法,及其应用实例。
报告题目4:单样本DNB评价复杂疾病早期预警信号
报告人:刘小平 山东大学
报告摘要:A new model-free method was developed and termed the landscape dynamic network biomarker (l-DNB) methodology. The method is based on bifurcation theory, which can identify tipping points prior to serious disease deterioration using only single-sample omics data. Here, we showed that l-DNB provides early-warning signals of disease deterioration on a single-sample basis and also detects critical genes or network biomarkers (i.e., DNB members) that promote the transition from normal to disease states. As a case study, l-DNB was used to predict severe influenza symptoms prior to the actual symptomatic appearance in influenza virus infections. The l-DNB approach was then also applied to three tumor disease datasets from the TCGA and was used to detect critical stages prior to tumor deterioration using individual DNB for each patient. The individual DNBs were further used as individual biomarkers in the analysis of physiological data, which led to the identification of two biomarker types that were surprisingly effective in predicting the prognosis of tumors. The biomarkers can be considered as common biomarkers for cancer, wherein one indicates a poor prognosis, and the other indicates a good prognosis.
报告题目5:健康状态的调控网络量化
报告人:王勇 中国科学院数学院
报告摘要:The greatest surprise of human genome-wide association studies (GWAS) is that 90% of disease-associated regions do not affect proteins directly, but instead lie in non-coding regions with putative gene-regulatory roles. This indicates the pressing need to reconstruct the process of regulatory element (RE) regulating gene expression with challenges in unknown cell type of action, relevant pathway, target gene, causality, and mechanism. We will introduce some progress to Interpret the genetic variants relevant to traits and diseases and provide some insights for precision health with genomics, epigenomics, and transcriptome data.
报告题目6:动力学因果及应用
报告人:林伟 复旦大学
报告摘要:本报告主要回顾和介绍利用动力学理论刻画因果关系的基本方法和最新进展。特别是针对时间连续和时间离散的确定性非线性系统产生的时序数据,将介绍几种基于映射连续性、几何性质的因果探测方法,同时指出时滞以及预测模式之因果探测的重要性。本报告还将结合具体模型和现实系统产生的各类数据,充分论证方法的普适性。
报告题目7:基于随机森林的因果检测
报告人:马欢飞 苏州大学
报告摘要:因果检测的定义和算法有很多种,如基于回归分析,基于交叉映射,基于信息流等方法。本次报告将汇报我们使用机器学习的方法来做因果检测,特别是有向网络重构的一些结果。
报告题目8:Dynamical system approach on scRNA-seq data analysis
报告人:李铁军 北京大学数学学院
报告摘要:Distinguishing the transition cells from the well-defined, meta-stable cell states is crucial to dissect the cell-fate decision process with high resolution. We present a dynamical system approach to reveal the cell-fate transition dynamics and discover the transition cells in single-cell transcriptomic data, based on multi-scale reduction of the data-driven cellular random walk. Our method visualizes cell development process as the barrier-crossing dynamics on the constructed energy landscape. We detect distinct cell states as the multiple attractor basins (or equivalently the potential wells) in dynamical system, and simultaneously quantify their mutual conversion probability to infer the developmental lineage based on most probable path. Application of our method to induced pluripotent stem cells (iPSCs) differentiation data unraveled the gene expression feature of the transition cells, and systematically uncovered the latent heterogeneity of upcoming cell-fate within bifurcating prime-streak state. This is a joint work with Peijie Zhou, Shuxiong Wang and Qing Nie.
报告题目9:用调控元件助力因果推断
报告人:王勇 中国科学院数学院
报告摘要:We will introduce our recent efforts PECA to integrate paired expression and chromatin accessibility data across diverse cellular contexts and model the localization to REs of chromatin regulators (CR), the activation of REs due to CRs that are localized to them, and the effect of TFs bound to activated REs on the transcription of target genes (TG). Particularly, we extend PECA to interpret genetic variants from population genetics and matched WGS data. vPECA models how noncoding SNPs causally affects the RE’s selection status, chromatin accessibility, and activity and further determine the target gene expression.
报告题目10:单细胞多组学数据的耦合建模
报告人:王勇 中国科学院数学院
报告摘要:Biological samples are often heterogeneous mixtures of different types of cells. Suppose we have two single-cell datasets, each providing information on a different cellular feature and generated on a different sample from this mixture. Then, the clustering of cells in the two samples should be coupled as both clusterings are reflecting the underlying cell types in the same mixture. This “coupled clustering” problem is a new problem not covered by existing clustering methods. We will discuss the progress in the joint analysis of single-cell RNA-sequencing and single-cell ATAC-sequencing data.
报告题目11:高维异质单细胞数据的复杂演化路径重构
报告人:万林 中国科学院数学与系统科学研究院
报告摘要:单细胞数据的数学建模与计算分析是当前计算生物学的热点和前沿。针对单细胞数据的高维和异质的特点,我们发展有效的拓扑型的算法与工具DensityPath,有效地实现了对高维、异质单细胞数据的可视化与动态演化路径重构。
报告题目12:TSEE: visualizing the intrinsic structure of time series single cell RNA sequencing data
报告人:马亮 中国科学院动物所
报告摘要:Time series single-cell RNA sequencing (scRNA-seq) data are emerging. However, the analysis of time series scRNA-seq data could be by 1) distortion created by assorted sources of data collection and generation across time samples and 2) inheritance of cell-to-cell variations by stochastic dynamic patterns of gene expression. This calls for the development of an algorithm able to visualize time series scRNA-seq data in order to reveal latent structures and uncover dynamic transition processes
In this study, we propose an algorithm, termed time series elastic embedding (TSEE), by incorporating experimental temporal information into the elastic embedding (EE) method, in order to visualize time series scRNA-seq data. TSEE extends the EE algorithm by penalizing the proximal placement of latent points that correspond to data points otherwise separated by experimental time intervals. We applied TSEE to visualize time series scRNA-seq datasets of the embryonic developmental process in human and zebrafish. TSEE can efficiently visualize time series scRNA-seq data by diluting the distortions of assorted sources of data variation across time stages and achieve the temporal resolution enhancement by preserving temporal order and structure. TSEE also shows great potential in uncovering the subtle dynamic structures of gene expression patterns, which helps in facilitating further downstream dynamic modeling and analysis of gene expression processes. The computational framework of TSEE is generalizable by allowing the incorporation of other sources of information.
报告题目13:基于单细胞转录组数据的细胞特异性网络构建(Cell-specific network constructed by single-cell RNA sequencing data)
报告人:戴昊 中国科学院上海生命科学研究院
报告摘要:单细胞转录组测序(scRNA-seq)提供了一种高通量方法来测量和比较在单细胞分辨率下的基因表达水平,从而揭示了细胞之间的异质性和功能多样性,帮助发现具有独特功能的新细胞类型。而基于单细胞数据中庞大的样本量,理论上我们还可以从这些数据中构建基因关联网络,并在更深层次发现隐含的基因调控关系的变化规律。为此,我们提出了一种从单细胞转录组数据为每个单细胞构建一个细胞特异性基因关联网络(CSN: Cell-specific network)的理论和方法,其来源于我们关于统计相关性的新理论模型,可以看作是从“不稳定”基因表达数据到“稳定”基因关联数据的转换。计算上,不需要对细胞事先进行聚类或分类,并可以识别基因之间的线性和非线性关联,从而使我们第一次能够在单细胞分辨率水平上识别基因之间的相互关联(网络)。利用该方法,可以从网络的角度对scRNA-seq数据进行聚类和拟轨迹分析,为scRNA-seq数据分析开辟了新的途径,并且,该方法还能够发现在网络层面起重要作用但通常被传统的差异表达分析所忽略的“暗”基因,其准确性和鲁棒性在多个scRNA-seq公开数据集中得到验证,从而使我们能够在网络层面上对单细胞转录组数据提取更丰富的生物系统信息。
报告题目14:Application and thinking of data integration in scRNA-seq analysis
报告人:曾涛 中国科学院上海生命科学研究院
报告摘要:Reducing noise pollution to data and ensuring the extracted intrinsic patterns in concordance with the primary data structure are important in sample clustering and classification (e.g. single cell clustering and identification). Here we propose an effective data integration framework named as HCI (High-order Correlation Integration), which takes an advantage of high-order correlation matrix incorporated with pattern fusion analysis, to realize high-dimensional data feature extraction and sample clustering simultaneously. To validate the effectiveness of our new method, we firstly applied HCI on four single-cell RNA-seq datasets to distinguish the cell types, and we found that HCI is capable of identifying the prior-known cell types of single-cell samples from scRNA-seq data with higher accuracy and robustness than other methods under different conditions. Secondly, we also integrated heterogonous omics data from TCGA datasets and GEO datasets including bulk RNA-seq data, which outperformed the other methods at identifying distinct cancer subtypes. All these results supported that HCI has extensive flexibility and applicability on sample clustering with different types and organizations of RNA-seq data.
报告题目15:Modelling HIV infection in mainland China
报告人:肖燕妮 西安交通大学
报告摘要:Since the first case of acquired immunodeficiency syndrome(AIDS) in China wasreported in 1985, human immunodeficiencyvirus (HIV) has spread to all of mainland China. National prevalence remains low, but the epidemic is severe in some areas. In this talk I shall briefly view the current situation of HIV infection in China. Both between-host models and within-host models were formulated and parameterized to assess the HIV/AIDS epidemic and examine impact of treatment strategy. We proposed several multi-scale models to link between-host dynamics and within-host dynamics or link between-host dynamics and community-level dynamics. In particular, we develop a multi-scale model to determine whether the two-level optimal controls are in accord or conflict. We prove that the within-host optimal control is always bang-bang control, and the coupled optimal control may be either bang-bang control or singular control of order 2, depending on coupling functions. Mathematical analysis shows that whether the two-level optimal controls coincide is determined by the sign of the product of their switching functions. Some key control strategies will then be proposed.