中心活动

2019年

当前位置: 首页 > 中心活动 > 年度主题活动 > 2019年 > 正文
Workshop on Bioinformatics and Data Analysis
日期:2019-06-11 点击:

Workshop Introduction

The past few decades have seen vast growth in the scale of biological data, bringing a tremendous need to explore how mathematics, statistics and computer science can be used for a better understanding of biological processes and systems. The purpose of the Workshop on Bioinformatics and Data Analysis is to bring together researchers and practitioners interested in the design and application of modelling frameworks, algorithmic concepts, computational methods, and information technologies to address challenging problems in bioinformatics. This workshop serves to showcase leading-edge research on new technologies and techniques around gathering, processing, analyzing, and modeling of data and information for a variety of scientific and healthcare applications. The Workshop on Bioinformatics and Data Analysis will be held from June 14 to 17, 2019, in Xi’an Jiaotong University, Xi’an, China. The workshop will also provide a great opportunity for graduate students, postdoctoral fellows and junior faculty to interact with leading researchers in these areas.

Program Overview

Date

Time

Contents

June   14, 2019

(Friday)

10:00-18:00

Registration

June 15, 2019

(Saturday)

08:30-08:50

Opening   Ceremony

08:50-10:10

Keynote   Lectures

10:10-10:30

Coffee   Break

10:30-11:50

Keynote   Lectures

11:50-14:00

Lunch

14:00-15:20

Keynote   Lectures

15:20-15:40

Coffee   Break

15:40-17:40

Keynote   Lectures

17:40-20:00

Banquet   (invited only)

June 16, 2019

(Sunday)

08:50-10:10

Keynote   Lectures

10:10-10:30

Coffee   Break/Poster

10:30-11:50

Keynote   Lectures

11:50-14:00

Lunch

14:00-15:20

Keynote   Lectures

15:20-15:40

Coffee   Break

15:40-18:00

Discussion

June 17, 2019

(Monday)

08:00-18:00

Departure

Date:   June 15, 2019                Location: Room 2-1, Math. Building

08:30-08:50

Opening   Ceremony

(Welcome   Speech/ Group Photo)

08:50-09:30

Guojun   Li: Development   of algorithms mining biological data

Chair: Yanni Xiao

09:30-10:10

Hiroshi   Mamitsuka: Graph-based   machine learning

10:10-10:30

Coffee   Break

10:30-11:10

Wing-Kin   Sung:   Finding structural variations in repeat regions using high-throughput   sequencing data

Chair: Xiaoqing Cheng

11:20-11:50

Limin   Li: Cross-species   data analysis and prediction via machine learning approaches

11:50-14:00

Lunch   (Jiaotong University Cambridge Hotel)

14:00-14:40

Lin   Gao: Cancer   related pattern discovery with omics data integration

Chair: Shuqin Zhang

14:40-15:20

Tatsuya   Akutsu: Graph   theoretic approaches to controllability of biological networks

15:20-15:40

Coffee   Break

15:40-16:20

Guoliang   Li:   3D genomics and chromatin interaction maps reveal genetic regulation for   quantitative traits in Maize

Chair: Bingqiang Liu

16:20-17:00

Jinzhi   Lei: Leukemic cell   plasticity induces immune escape after CD19 CAR-T cell therapy of B-ALL

17:00-17:40

Shuqin   Zhang: Simultaneous   clustering of multi-view biomedical data using manifold optimization

17:40-20:00

Banquet   (invited only)

Date:   June 16, 2019                 Location:  Room 2-1, Math. Building

08:50-09:30

Luonan   Chen:   Single-cell data analysis by cell-specific network

Chair:

Limin Li

09:30-10:10

Min Li:   Deep   learning for protein bioinformatics and medicine

10:10-10:30

Coffee Break/Poster

10:30-11:10

Wei Lin:Dynamical time series analytics: From networks

construction to dynamics prediction

Chair: Minghua Deng

11:10-11:50

Louxin   Zhang: Mathematical   issues of tumor development trees

11:50-14:00

Lunch (   Jiaotong University Cambridge Hotel )

14:00-14:40

Hulin   Wu: An   innovative collaboration platform with scalable analytic tools to efficiently   promote use/reuse of time course gene expression data for scientific   discoveries

Chair:

Xiaoqing Cheng


14:40-15:20

Minghua   Deng:   Network inference of compositional data based on D-trace loss

15:20-16:00

Bingqiang   Liu:   Computational prediction of alternative transcription unites in prokaryotic   genomes

16:00-16:20

Coffee Break

16:20-18:00

Discussion


Information of Topics

Topic: Graph theoretic approaches to controllability of biological networks

Speaker: Tatsuya Akutsu (Kyoto University, Japan)

Abstract: Development of control theory for biological systems is one of major goals in systems biology and bioinformatics, and various studies have been done. Recently, several graph theoretic concepts have been utilized for finding driver nodes that can control the entire state of a network. In particular, maximum matching (MM), minimum dominating set (MDS), and feedback vertex set (FVS) have been widely utilized. In this talk, these concepts are briefly reviewed with conceptual comparison. In addition, several variants/extensions of the MDS-based approach, which have been

explored by us, are explained. Furthermore, our recent theoretical result on controllability of Boolean networks (BNs) is presented, which suggests that control of a BN requires a small number of driver nodes if the targets are restricted to be attractors. This result is based on discovery of novel relationships between control problems on BNs and the coupon collector's problem, a well-known concept in combinatorics.


Topic: Single-cell data analysis by cell-specific network

Speaker: Luonan Chen (Shanghai Institutes of Biological Sciences, CAS, China)

Abstract: Single-cell RNA sequencing (scRNA-seq) is able to give an insight into the gene–gene associations or transcriptional networks among cell populations based on the sequencing of a large number of cells. However, traditional network methods are limited to the grouped cells instead of each single cell, and thus the heterogeneity of single cells will be erased. We present a new method to construct a cell-specific network (CSN) for each single cell from scRNA-seq data (i.e. one network for one cell), which transforms the data from‘unstable’ gene expression form to ‘stable’ gene association form on a single-cell basis. In particular, it is for the first time that we can identify the gene associations/network at a single-cell resolution level. By CSN method, scRNA-seq data can be analyzed for clustering and pseudo-trajectory from network perspective by any existing method, which opens a new way to scRNA-seq data analyses. In addition, CSN is able to find differential gene associations for each single cell, and even ‘dark’ genes that play important roles at the network level but are generally ignored by traditional differential gene expression analyses. In addition, CSN can be applied to construct individual network of each sample bulk RNA-seq data. Experiments on various scRNA-seq datasets validated the effectiveness of CSN in terms of accuracy and robustness.

Topic: Network inference of compositional data based on D-trace loss

Speaker: Minghua Deng (Peking University, China)

Abstract: Microbes play an important role in the environment and human life. The development of high-throughput sequencing technologies for 16S rRNA gene profiling provides higher quality compositional data for microbe communities. In this talk, I will introduce two recent works on network inference for compositional data. Under the framework of D-trace loss, we introduced different loss functions, and estimated the precision matrix as well as the difference of precision matrix by optimizing the corresponding penalized loss function. Simulation and real data analysis show their out-performance of the proposed methods.


Topic: Cancer related pattern discovery with omics data integration

Speaker: Lin Gao (Xidian University, China)

Abstract: The mechanism, diagnosis and prognosis of cancer is one of the core researches problem in life science and related multidisciplinary domain. The challenge is that the progression process of a cancer is a highly dimensional, time varying, and dynamic system. How do we discover cancer-causing gene patterns, and finally associate these patterns with cancer initiation and progression. The system biology and complex network provide new insight for cancer. With increasing amounts of multi-omics data becoming available, we can construct the computational model of those kinds of data by network. In this talk, I will investigate network models for different patterns for cancer. The key theoretic and methodological challenges face in computational cancer modeling.


Topic: Leukemic cell plasticity induces immune escape after CD19 CAR-T cell therapy of B-ALL

Speaker: Jinzhi Lei (Tsinghua University, China)

Abstract: Chimeric antigen receptor (CAR) therapy targeting CD19 is an effective treatment for refractory B cell malignancies, especially B cell acute lymphoblastic leukaemia (B-ALL). The majority of patients achieve a complete response following a single infusion of CD19-targeted CAR-modified T cells (CAR-19 T cells); however, many patients suffer relapse after therapy, and the underlying mechanism remains unclear. Here, we applied second-generation CAR-T cells to mice injected with NALM-6-GL leukaemic cells; 60% of the mice relapsed within 3 months. Further analysis by flow cytometry and RNA sequencing revealed that the relapsed tumours retained CD19 expression but exhibited a profound increase in CD34 transcription. These observations led to the hypothesis that CAR-T treatment induced tumour cells to transition to haematopoietic stem-like cells (HSLCs) and myeloid-like cells and hence escape of CAR-T targeting. A computational model for the heterogeneous responses of the tumour cells to the CAR-T treatment was developed based on this proposed hypothesis and verified the experimental observations. Model simulations predicted that CAR-T cell-induced cell plasticity can lead to tumour relapse in B-ALL after CAR-19 T treatment. Our simulations and mouse experiments further indicated that CD19+ relapse could be prevented by the combined administration of CAR-19 and CD123-targeted CAR-modified (CAR-123) T cells administered at specific ratios. These findings highlight the important role of CAR-T stress-induced stem-like tumour cell transition in CD19-positive relapse and suggest a potential strategy for using a dual targeting CAR-T therapy for B-ALL treatment. (This work is in collaboration with Xiaosong Zhong at Beijing Shijitan Hospital)


Topic: Development of algorithms mining biological data

Speaker: Guojun Li (Shandong University, China)

Abstract: With the rapid development of biotechnology, massive amounts of biological data have been generated and continuing to be increased explosively. Extracting biological information from the massive biological data is highly challenging. On the other hand, the rapid development of computer science and technology provides powerful computing power and computing resources for mining bigdata, making the use of computational means to study life sciences into reality. Taking this opportunity, I will introduce some applications of combinatorial optimization methods into the scientific field of biomedical bigdata via showing several specific cases of mining biological bigdata.


Topic: 3D genomics and chromatin interaction maps reveal genetic regulation for quantitative traits in Maize

Speaker: Guoliang Li (Huazhong Agricultural University, China)

Abstract: TBA


Topic: Cross-species data analysis and prediction via machine learning approaches.

Speaker: Limin Li (Xi’an Jiaotong University)

Abstract: Due to ethical or expense constraints, efforts to improve human-disease modeling have focused on developing mice that closely mimic human biology. However, the cross-species differences make it challenging to translate knowledge from mice to human, and there have been few attempts to develop computational methods for overcoming translational challenges. In this talk, I will present our recent work on cross-species data analysis and prediction via machine learning approaches. Transfer learning is a machine learning field which attempts to correct the discrepancy between the source domain and the target domain such that the two domains share the same distribution. I will first present our two domain adaptation methods DACoM and DMMD for heterogeneous cross-species data analysis. Based on the idea of transfer learning, I will present our method cross-species gene set enrichment analysis XGSEA that could perform GSEA for human gene sets when only mouse data sets are available. The experiments show that our method could outperform direct cross-species inference from mouse results, and highlight the signals that may be missed with no cost on biological experiments.


Topic: Deep learning for protein bioinformatics and medicine

Speaker: Min Li (Central South University, China)

Abstract: Mining useful information from biomedical data is not only the crucial of life science, but also the foundation of understanding the development of diseases. In recent years, a lot of biomedical data have been accumulated from omics technologies, imaging, electronic health records, and so on. Meanwhile, with the development of big data and hardware, deep learning techniques have been successfully used in various fields such as computer version, speech recognition, and natural language processing. Considering their excellent performance, we implemented some deep learning models to tackle biomedical data. In protein bioinformatics, we focus on protein-protein interaction sites prediction, essential protein prediction, protein function prediction, and drug-target prediction. We built some deep learning models for extract local and global features of protein sequences; then combined these features to improve the predictive performance. For clinic data, we focus on electronic health records classification and disease prediction. We developed some deep learning models which capture the features of electronic health records and disease; then used these features to conduct study. We hope that our studies can promote the application of deep learning in biomedical data analysis, and provide useful tools for solving the key problems in life science by using artificial intelligence techniques.


Topic:Dynamical time series analytics: From networks construction to dynamics prediction

Speaker: Wei Lin (Fudan University, China)

Abstract: In this talk, I will introduce two model-free frameworks of dynamical time series analytics. One framework is to detect the causation interactions among a large group of dynamical variables, which probably recovers a network hidden in a real-world system we are concerned. The second framework is to make a forecast or future prediction of dynamics based only on short-term and high-dimensional time series, which is usually believed to be a challenging task. Both frameworks use the advantages of Taken's embedding techniques, which reveals that utilization of dynamical system theory is more likely to exploit useful information from time series not only from the models but also from the real-world systems.


Topic: Computational prediction of alternative transcription units in prokaryotic genomes

Speaker: Bingqiang Liu (Shandong University, China)

Abstract: Identification of transcription units (TUs) encoded in prokaryotes is essential to predict the function of unknown genes, annotate the prokaryotic genome and construct the transcriptional and translation regulatory networks at the gene level. The alternative transcription units (ATUs) are the dynamic TUs from a cluster of genes. The identification of ATUs is recognized as a more challenging computational problem due to their condition-dependent nature, and the next generation sequencing technique provided a good opportunity. We are trying to develop a method to predict ATUs in prokaryotes based on RNA-seq data. The problem was described as a mathematical programming model, along with the integrating of other factors including RNA degradation effect, cross-gene reads. We tested the methods with two RNA-seq data on E.coli genome and compared the predicted ATUs with experimentally validated ATUs from previous studies. The comparison results show that our algorithm can recover the majority of previously known ATUs with average precision of 0.70/0.66 and recall of 0.77/0.79 on two datasets. As the first de novo computational ATU prediction pipeline, the new method will facilitate the research on complex mechanism of transcriptional regulation, and bring more attention to the function of alternative transcription units in prokaryotic genomes.  


Topic: Graph-based machine learning

Speaker: Hiroshi Mamitsuka (Kyoto University, Japan)

Abstract: A wide variety of machine learning techniques have been developed and used for a lot of applications. Traditionally data are vectors, meaning that each instance is a vector of features, by which the entire data set is a matrix. A typical example is a user-item matrix in e-commerce, where each instance is a user or item, for which features are purchase records. A recent, more emerging data type is a graph, particularly that with unique nodes, resulting in an adjacency matrix showing the similarity between nodes (instances). Examples are social networks, web links and biological networks, especially gene networks, in which edges represent biological relationships between nodes (genes), such as gene regulation, protein-protein interactions, and so on. In this talk, I will explain the key idea of machine learning for graphs (MLG) and also describe major problem settings and efficient algorithms under MLG.


Topic: Finding structural variations in repeat regions using high-throughput sequencing data

Speaker: Wing-Kin Sung (National University of Singapore, Singapore)

Abstract: Structural variations are important since they can cause diseases. They can be discovered using second generation sequencing. The performance of existing software is good enough to call them if they are in non-repeat regions. However, when the SVs are in repeat regions, the performance is bad. For example, for insertion events, existing software can call less than 10% of the benchmark insertion events. In this talk, we will check if we can improve the performance of SV calling.


Topic: An innovative collaboration platform with scalable analytic tools to efficiently promote use/reuse of time course gene expression data for scientific discoveries

Speaker: Hulin Wu (University of Texas Health Science Center at Houston, USA)

Abstract: In the Big Data era, generating data is no longer rate limiting for scientific discoveries; instead, processing, storing, accessing, analyzing and sharing a vast amount of data are becoming the major challenges and bottlenecks. In particular, the exponential growth of genomic/genetic data demands new scalable algorithms and new solutions for making these data findable, accessible, interoperable, and reusable (FAIR). The genetic data analysis can be complicated; the analysis results can be very comprehensive and in a large amount, which may need both domain-specific scientists and statisticians/bioinformaticians/data scientists to work together to extract insights, interpret and disseminate the results. In this study, we proposed a new collaboration platform to engage domain-specific biomedical scientists with statisticians, bioinformaticians and data scientists to annotate, interpret and communicate a large number of time course gene expression data and their analytic results from GEO, SRA, ArrayExpress and TCGA to the general research community via publications. We proposed to scale up an innovative analysis/modeling pipeline to enable a large number of time course gene expression data sets to be analyzed and modeled via restructuring and harmonizing multiple gene expression databases using the FAIR principles, we developed a web-based user-friendly collaboration platform for interactive visualization and dissemination of a large number of analysis results produced from the scalable analytic pipeline. The recommendation functions/systems and literature mining tools are also developed to facilitate efficient collaborations between genetic/biomedical investigators and statisticians/bioinformaticians/data scientists to use the large number of analysis results. We expect that the proposed research platform (http://genestudy.org/) could be used for a large number of scientific discoveries.


Topic: Mathematical issues of tumor development trees

Speaker: Louxin Zhang (National University of Singapore, Singapore)

Abstract: Tumor development history is proposed to be modeled as mutation-labeled trees in recent years. These trees are different from tree models in evolution. A couple mathematical issues of tumor evolutionary trees will be discussed in my talk.


Topic: Simultaneous clustering of multiview biomedical data using manifold optimization

Speaker: Shuqin Zhang (Fudan University, China)

Abstract: Multiview clustering has attracted much attention in recent years. Several models and algorithms have been proposed for finding the clusters. However, these methods are developed either to find the consistent/common clusters across different views, or to identify the differential clusters among different views. In reality, both consistent and differential clusters may exist in multiview datasets. Thus, development of simultaneous clustering methods such that both the consistent and the differential clusters can be identified is of great importance.

In this talk, we will introduce one method for simultaneous clustering of multiview data based on manifold optimization. The binary optimization model for finding the clusters is relaxed to a real value optimization problem on the Stiefel manifold, which is solved by the line-search algorithm on manifold. We applied the proposed method to both simulation data and four real datasets from TCGA. Both studies show the good performance of the proposed method.


版权所有:西安交通大学数学与数学技术研究院  设计与制作:西安交通大学数据与信息中心
地址:陕西省西安市碑林区咸宁西路28号  邮编:710049