Gene Discovery in Genomes Using Motif Searching
Our project aims to facilitate functional annotation of several model organism genomes by detecting similarity features using several hidden Markov model (HMM) profile-based search methods. Both the nature of the methods that will be used and the scale at which the analyses will be performed make the task too computationally intensive to be accomplishable on desktop workstations.
|
Principal Investigator Jill GreadyComputational Proteomics and Therapy Design, JCSMR Australian National University |
Project x58 |
|
Co-Investigator Alex ZelenskyComputational Proteomics and Therapy Design, JCSMR Australian National University |
RFCD Codes 270199, 270299, 280499 |
Significant Achievements, Anticipated Outcomes and Future Work
The project arises from our initial analysis of the distribution of a eukaryotic protein superfamily in the EnsEMBL-annotated Fugu rubripes genome; we discovered that the gene prediction algorithm used for its annotation routinely fails to predict some of the features. The most common problem is the absence of transmembrane domain-encoding regions from the predictions, which is observed with a frequency of about 95%. Another problem is the failure to identify genes encoding proteins containing well-conserved domains, but with domain order or combination not observed in any known protein. Consequently we are using profile-based sequence analysis methods, which are known to be both very sensitive and selective, to correct the existing (mis)annotations and predict novel genes that were missed by the original EnsEMBL annotation. One such search taking 10,000 hours has been performed so far. The results need further analysis to determine whether the HMMs need to be refined and the search rerun. Also, as the Fugu DB is still being "cleaned up" it may be useful to rerun the searches after the new version of the genome assembly is available soon. We will also likely do searches against the recently release of the Ciona intestinalis (a primitive chordate) genome DB. The initial results of the study have been submitted for publication.
Computational Techniques Used
Hidden Markov model (HMM) profile-based search methods applied to whole-genome sequence databases.
Publications, Awards and External Funding
External Funding and Awards
None.
Publications
None.