Computational Biology is a research area for analyzing biological data such as genome sequence to find new knowledge that helps biological/medical research, or for developing an efficient method that help analyzing biological data.
The state-of-the-art DNA sequencer generates 200 Giga bases per day, which is hundreds of thousand times as large amount of data as the technology of 15 years ago can generate1. This dramatic change in data generation has brought many issues in DNA sequence analyses. In our lab, we aim to develop efficient methods for utilizing such biological big data.
Our topics include, but not limited to:
Developing efficient methods for fundamental genome sequence analysis
The output of the genome sequencer is huge amount of sequence fragments. Therefore, it is necessary to align each fragment with the reference genome, which is already determined before, or to assemble the fragments to reconstruct the original sequence, or to cluster similar fragments, before conducting downstream analyses. We aim to develop more efficient methods for such fundamental tasks by using efficient string processing or/and graph mining algorithms.
Reference genome graph
Currently, the reference genome is represented as a single sequence. Since genome sequence is unique to individuals, it is more natural to represent the reference as a graph structure that can illustrate diversity of sequences of the same species rather than a sequence. We aim to develop new method to construct an efficient data structure for reference genome graph.
Privacy-preserving datamining for biological data
The huge cost down in DNA sequencing has encouraged large-scale personal genome sequencing, however, genomic data that include personal information are not fully utilized at present because privacy issues hinder flexible analyses for finding novel knowledge. To tackle this problem, we aim to develop an efficient algorithm that enables several parties to jointly conduct biological data analysis over their input, while at the same time keeping these inputs private by using cryptographic technique such as homomorphic encryption.
Other topics (only keywords)
Sequence compression, finding association between genes and disease, analyses of genome structural variation