Gene Discovery in Genomes Using Motif Searching


Our project aims to facilitate functional annotation of several model organism genomes by detecting similarity features using several hidden Markov model (HMM) profile-based search methods. Both the nature of the methods that will be used and the scale at which the analyses will be performed make the task too computationally intensive to be accomplishable on desktop workstations.


Principal Investigator

Jill Gready
Computational Proteomics and Therapy Design, JCSMR
Australian National University

Project

x58

Co-Investigator

Alex Zelensky
Computational Proteomics and Therapy Design, JCSMR
Australian National University

RFCD Codes

270199, 270299, 280499


Significant Achievements, Anticipated Outcomes and Future Work

The project arises from our initial analysis of the distribution of a eukaryotic protein superfamily in the EnsEMBL-annotated Fugu rubripes genome; we discovered that the gene prediction algorithm used for its annotation routinely fails to predict some of the features. The most common problem is the absence of transmembrane domain-encoding regions from the predictions, which is observed with a frequency of about 95%. Another problem is the failure to identify genes encoding proteins containing well-conserved domains, but with domain order or combination not observed in any known protein. Consequently we are using profile-based sequence analysis methods, which are known to be both very sensitive and selective, to correct the existing (mis)annotations and predict novel genes that were missed by the original EnsEMBL annotation. One such search taking 10,000 hours has been performed so far. The results need further analysis to determine whether the HMMs need to be refined and the search rerun. Also, as the Fugu DB is still being "cleaned up" it may be useful to rerun the searches after the new version of the genome assembly is available soon. We will also likely do searches against the recently release of the Ciona intestinalis (a primitive chordate) genome DB. The initial results of the study have been submitted for publication.

 

Computational Techniques Used

Hidden Markov model (HMM) profile-based search methods applied to whole-genome sequence databases.

 

Publications, Awards and External Funding

External Funding and Awards

None.

Publications

None.