ORI-GENE: gene classification based on the evolutionary tree.
Mizuno H, Tanaka Y, Nakai K, Sarai A. Bioinformatics. 2001, 17:167-73.
MOTIVATION:
Genome projects have produced large amounts of data on the sequences of new genes whose functions are... more
MOTIVATION:
Genome projects have produced large amounts of data on the sequences of new genes whose functions are as yet unknown. The functions of new genes are usually inferred by comparing their sequences with those of known genes, but evaluation of the sequence homology of individual genes does not make the most of the available sequence information. Therefore, new methods and tools for extracting more biological information from homology searches would be advantageous.
RESULTS:
We have developed a computational tool, ORI-GENE, to analyze the results of sequence homology searches from the perspective of the evolution of selected sets of new genes. ORI-GENE has a graphical interface and accomplishes two important tasks: first, based on the output of homology searches, it identifies species with similar genes and displays their pattern of distribution on the phylogenetic tree. This function enables one to infer the way in which a given gene may have propagated among species over time. Second, from the distribution patterns, it predicts the point at which a given gene may have been first acquired (i.e. its 'origin'), then classifies the gene on that basis. Because it makes use of available evolutionary information to show the way in which genes cluster among species, ORI-GENE should be an effective tool for the screening and classification of new genes revealed by genome analysis.
AVAILABILITY:
ORI-GENE is retrievable via the Internet at: http://www.rtc.riken.go.jp/jouhou/ORI-GENE.
Marine genomics Europe, resources for aquaculture
Society of Experimental Biology, Barcelona. July 2005, Comparative
Biochemistry and Physiology 141A: S87
Liu, Z., G.-H. Li, J.-F. Huang, R.W. Murphy, and P. Shi. 2012. Hearing aid for vertebrates via multiple episodic adaptive events on prestin genes
Liu, Z., G.-H. Li, J.-F. Huang, R.W. Murphy, and P. Shi. 2012. Hearing aid for vertebrates via multiple episodic adaptive events on prestin genes. Molecular Biology and Evolution. DOI: 10.1093/molbev/mss087
Auditory detection is essential for survival and reproduction of vertebrates, yet the genetic changes underlying the... more
Auditory detection is essential for survival and reproduction of vertebrates, yet the genetic changes underlying the evolution and diversity of hearing are poorly documented. Recent discoveries concerning the prestin gene, which is responsible for cochlear amplification by electromotility, provide an opportunity to redress this situation. We identify prestin genes from the genomes of 14 vertebrates, including three fishes, one amphibian, one lizard, one bird, and eight mammals. An evolutionary analysis of these sequences and 34 previously known prestin genes reveals for the first time that this hearing gene was under positive selection in the most recent common ancestor (MRCA) of tetrapods. This discovery might document the genetic basis of enhanced high sound sensibility in tetrapods. An investigation of the adaptive gain and evolution of electromotility, an important evolutionary innovation for the highest hearing ability of mammals, detects evidence for positive selections on the MRCA of mammals, therians, and placentals, respectively. It is suggested that electromotility determined by prestin might initially appear in the MRCA of mammals and its functional improvements might occur in the MRCA of therian and placental mammals. Our patch clamp experiments further support this hypothesis, revealing the functional divergence of voltage-dependent nonlinear capacitance (NLC) of prestin from platypus, opossum, and gerbil. Moreover, structure-based cdocking analyses detect positively selected amino acids in the MRCA of placental mammals that are key residues in sulfate anion transport. This study provides new insights into the adaptation and functional diversity of hearing sensitivity in
vertebrates by evolutionary and functional analysis of the hearing gene of prestin.
6 views
Seen by:Gene fragmentation in bacterial draft genomes: extent, consequences and mitigation
Klassen, J. L. and Currie, C. R. 2012. BMC Genomics 13: 14. doi:10.1186/1471-2164-13-14
Background
Ongoing technological advances in genome sequencing are allowing bacterial genomes to be... more
Background
Ongoing technological advances in genome sequencing are allowing bacterial genomes to be sequenced at ever-lower cost. However, nearly all of these new techniques concomitantly decrease genome quality, primarily due to the inability of their relatively short read lengths to bridge certain genomic regions, e.g., those containing repeats. Fragmentation of predicted open reading frame (ORFs) is one possible consequence of this decreased quality. In this study we quantify ORF fragmentation in draft microbial genomes and its effect on annotation efficacy, and we propose a solution to ameliorate this problem.
Results
A survey of draft-quality genomes in GenBank revealed that fragmented ORFs comprised > 80% of the predicted ORFs in some genomes, and that increased fragmentation correlated with decreased genome assembly quality. In a more thorough analysis of 25 Streptomyces genomes, fragmentation was especially enriched in some protein classes with repeating, multi-modular structures such as polyketide synthases, non-ribosomal peptide synthetases and serine/threonine kinases. Overall, increased genome fragmentation correlated with increased false-negative Pfam and COG annotation rates and increased false-positive KEGG annotation rates. The false-positive KEGG annotation rate could be ameliorated by linking fragmented ORFs using their orthologs in related genomes. Whereas this strategy successfully linked up to 46% of the total ORF fragments in some genomes, its sensitivity appeared to depend heavily on the depth of sampling of a particular taxon's variable genome.
Conclusions
Draft microbial genomes contain many ORF fragments. Where these correspond to the same gene they have particular potential to confound comparative gene content analyses. Given our findings, and the rapid increase in the number of microbial draft quality genomes, we suggest that accounting for gene fragmentation and its associated biases is important when designing comparative genomic projects.
EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data
Evolutionary Bioinformatics
Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological... more Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies.
Amino acids biosynthesis and nitrogen assimilation pathways: a great genomic deletion during eukaryotes evolution
BMC Genomics
Besides being building blocks for proteins, amino acids are also key metabolic intermediates in living cells.... more
Besides being building blocks for proteins, amino acids are also key metabolic intermediates in living cells. Surprisingly a variety of organisms are incapable of synthesizing some of them, thus named Essential Amino Acids (EAAs). How certain ancestral organisms successfully competed for survival after losing key genes involved in amino acids anabolism remains an open question. Comparative genomics searches on current protein databases including sequences from both complete and incomplete genomes among diverse taxonomic groups help us to understand amino acids auxotrophy distribution.
Here, we applied a methodology based on clustering of homologous genes to seed sequences from autotrophic organisms Saccharomyces cerevisiae (yeast) and Arabidopsis thaliana (plant). Thus we depict evidences of presence/absence of EAA biosynthetic and nitrogen assimilation enzymes at phyla level. Results show broad loss of the phenotype of EAAs biosynthesis in several groups of eukaryotes, followed by multiple secondary gene losses. A subsequent inability for nitrogen assimilation is observed in derived metazoans.
A Great Deletion model is proposed here as a broad phenomenon generating the phenotype of amino acids essentiality followed, in metazoans, by organic nitrogen dependency. This phenomenon is probably associated to a relaxed selective pressure conferred by heterotrophy and, taking advantage of available homologous clustering tools, a complete and updated picture of it is provided.
18 views
Seen by:Legionella Pneumophila Pangenome Reveals Strain-Specific Virulence Factors
D'Auria G, Jiménez-Hernández N, Peris-Bondia F, Moya A, Latorre A
BMC Genomics. 2010 Mar 17;11(1):181
BACKGROUND: Legionella pneumophila subsp. pneumophila is a gram-negative gamma-Proteobacterium and the causative agent... more
BACKGROUND: Legionella pneumophila subsp. pneumophila is a gram-negative gamma-Proteobacterium and the causative agent of Legionnaires' disease, a form of epidemic pneumonia. It has a water-related life cycle. In industrialized cities L. pneumophila is commonly encountered in refrigeration towers and water pipes. Infection is always via infected aerosols to humans. Although many efforts have been made to eradicate Legionella from buildings, it still contaminates the water systems. The town of Alcoy (Valencian Region, Spain) has had recurrent outbreaks since 1999. The strain "Alcoy 2300/99" is a particularly persistent and recurrent strain that was isolated during one of the most significant outbreaks between the years 1999-2000.
RESULTS: We have sequenced the genome of the particularly persistent L. pneumophila strain Alcoy 2300/99 and have compared it with four previously sequenced strains known as Philadelphia (USA), Lens (France), Paris (France) and Corby (England).Pangenome analysis facilitated the identification of strain-specific features, as well as some that are shared by two or more strains. We identified: (1) three islands related to anti-drug resistance systems; (2) a system for transport and secretion of heavy metals; (3) three systems related to DNA transfer; (4) two CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) systems, known to provide resistance against phage infections, one similar in the Lens and Alcoy strains, and another specific to the Paris strain; and (5) seven islands of phage-related proteins, five of which seem to be strain-specific and two shared.
CONCLUSIONS: The dispensable genome disclosed by the pangenomic analysis seems to be a reservoir of new traits that have mainly been acquired by horizontal gene transfer and could confer evolutionary advantages over strains lacking them.
28 views
Seen by:Homology modeling, comparative genomics and functional annotation of Mycoplasma genitalium hypothetical protein MG_237
by Azeem Butt
Mycoplasma genitalium is a human pathogen associated with several sexually transmitted diseases. The complete genome... more
Mycoplasma genitalium is a human pathogen associated with several sexually transmitted diseases. The complete genome of M.genitalium G37 has been sequenced and provides an opportunity to understand the pathogenesis and identification of therapeutic targets. However, complete understanding of bacterial function requires proper annotation of its proteins. The genome of M. genitalium consists of 475 proteins. Among these, 94 are without any known function and are described as ‘hypothetical proteins’. We selected MG_237 for sequence and structural analysis using a bioinformatics approach. Primary and secondary structure analysis suggested that MG_237 is a hydrophilic protein containing a significant proportion of alpha helices, and subcellular localization predictions suggested it is a cytoplasmic protein. Homology modeling was used to define the three-dimensional (3D) structure of MG-237. A search for templates revealed that MG_237 shares 63% homology to a hypothetical protein of Mycoplasma pneumoniae, indicating this protein is evolutionary conserved. The refined 3D model was generated using (PS)2-v2 sever that incorporates MODELLER. Several quality assessment and validation parameters were computed and indicated that the homology model is reliable. Furthermore, comparative genomics analysis suggested MG_237 as non-homologous protein and involved in
four different metabolic pathways. Experimental validation will provide more insight into the actual function of this protein in
microbial pathways.
Mycoplasma genitalium: a comparative genomics study of metabolic pathways for the identification of drug and vaccine targets
by Azeem Butt
Published in Infection, Genetics and Evolution
Increasing emergence of antibiotic-resistant pathogenic microorganisms is one of the biggest challenges for biomedical... more Increasing emergence of antibiotic-resistant pathogenic microorganisms is one of the biggest challenges for biomedical research and drug development. Traditional drug discovery methods are time-consuming, expensive and often yield few drug targets. In contrast, advances in complete genome sequencing, bioinformatics and cheminformatics represent an attractive alternative approach to identify drug targets worthy of experimental follow-up. Mycoplasma genitalium is a human parasitic pathogen that is associated with several sexually transmitted diseases. Recently, emergence of treatment-resistant isolates has been reported, which raises serious concern and a need for identification of additional drug targets. In the present study, a systematic workflow consisting of comparative genomics, metabolic pathways analysis and additional drug prioritizing parameters was defined for the identification of novel drug and vaccine targets that are essential for M. genitalium, but absent in its human host. In silico analyses and manual mining identified 79 proteins of M. genitalium, which showed no similarity to human proteins. Among these, sixty-seven proteins were identified as non-homologous essential proteins that could serve as potential drug and vaccine targets. Subcellular localization, molecular weight, and three dimensional structural characteristics that could facilitate filtering of attractive drug targets were also calculated for the non-homologous essential proteins. Enzymes from thiamine biosynthesis, protein biosynthesis, and folate biosynthesis were identified as attractive candidates for drug development. Furthermore, druggability of each of the identified drug targets was also evaluated by the DrugBank database. Results from this study could facilitate selection of M. genitalium proteins for entry into drug design and vaccine production pipelines.
Resolving the question of trypanosome monophyly: a comparative genomics approach using whole genome data sets with low taxon sampling
by Guy Leonard
Since the first attempts to classify the evolutionary history of trypanosomes, there have been conflicting reports... more Since the first attempts to classify the evolutionary history of trypanosomes, there have been conflicting reports regarding their true phylogenetic relationships and, in particular, their relationships with other vertebrate trypanosomatids, e.g. Leishmania sp., as well as with the many insect parasitising trypanosomatids. Perhaps the issue that has provided most debate is that concerning the monophyly (or otherwise) of genus Trypanosoma and, even with the advent of molecular methods, the findings of numerous studies have varied significantly depending on the gene sequences analysed, number of taxa included, choice of outgroup and phylogenetic methodology. While of arguably limited applied importance, resolution of the question as to whether or not trypanosomes are monophyletic is critical to accurate evaluation of competing, mutually exclusive evolutionary scenarios for these parasites, namely the 'vertebrate-first' or 'insect-first' hypotheses. Therefore, a new approach, which could overcome previous limitations was needed. At its most simple, the problem can be defined within the framework of a trifurcated tree with three hypothetical positions at which the root can be placed. Using BLASTp and whole-genome gene-by-gene phylogenetic analyses of Trypanosoma brucei, Trypanosoma cruzi, Leishmania major and Naegleria gruberi, we have identified 599 gene markers--putative homologues--that were shared between the genomes of these four taxa. Of these, 75 homologous gene families that demonstrate monophyly of the kinetoplastids were identified. We then used these data sets in combination with an additional outgroup, Euglena gracilis, coupled with large-scale gene concatenation and diverse phylogenetic techniques to investigate the relative branching order of T. brucei, T. cruzi and L. major. Our findings confirm the monophyly of genus Trypanosoma and demonstrate that <1% of the analysed gene markers shared between the genomes of T. brucei, T. cruzi and L. major reject the hypothesis that the trypanosomes form a monophyletic group.
Comparative analysis of teleost genome sequences reveals an ancient intron size expansion in the zebrafish lineage
by Steve Moss
We have developed a bioinformatics pipeline for the comparative evolutionary analysis of Ensembl genomes, and have... more We have developed a bioinformatics pipeline for the comparative evolutionary analysis of Ensembl genomes, and have used it to analyse the introns of the five available teleost fish genomes. We show our pipeline to be a powerful tool for revealing variation between genomes that may otherwise be overlooked with simple summary statistics. We identify that the zebrafish, Danio rerio, has an unusual distribution of intron sizes, with a greater number of larger introns in general and a notable peak in the frequency of introns of ∼500 bp to 2,000 bp compared to the monotonically decreasing frequency distributions of the other fish. We determine that 47% of Danio rerio introns are composed of repetitive sequences, although the remainder, over 331 Mb, is not. Since repetitive elements may be the origin of the majority of all non-coding DNA, it is likely that the remaining Danio rerio intronic sequence has an ancient repetitive origin and has since accumulated so many mutations that it can no longer be recognised as such. To study such an ancient expansion of repeats in the Danio lineage will require further comparative analysis of fish genomes incorporating a broader distribution of teleost lineages.
19 views
Seen by:Functional Adaptations of the Transcriptome to Mastitis-Causing Pathogens: The Mammary Gland and Beyond
Co-authored with Juan J. Loor and Kasey M. Moyes
Application of microarrays to the study of intramammary infections in recent years has provided a wealth of... more Application of microarrays to the study of intramammary infections in recent years has provided a wealth of fundamental information on the transcriptomics adaptation of tissue/cells to the disease. Due to its heavy toll on productivity and health of the animal, in vivo and in vitro transcriptomics works involving different mastitis-causing pathogens have been conducted on the mammary gland, primarily on livestock species such as cow and sheep, with few studies in non-ruminants. However, the response to an infectious challenge originating in the mammary gland elicits systemic responses in the animal and encompasses tissues such as liver and immune cells in the circulation, with also potential effects on other tissues such as adipose. The susceptibility of the animal to develop mastitis likely is affected by factors beyond the mammary gland, e.g. negative energy balance as it occurs around parturition. Objectives of this review are to discuss the use of systems biology concepts for the holistic study of animal responses to intramammary infection; providing an update of recent work using transcriptomics to study mammary and peripheral tissue (i.e. liver) as well as neutrophils and macrophage responses to mastitis-causing pathogens; discuss the effect of negative energy balance on mastitis predisposition; and analyze the bovine and murine mammary innate-immune responses during lactation and involution using a novel functional analysis approach to uncover potential predisposing factors to mastitis throughout an animal's productive life.
89 views
Seen by:Parallel Evolution and Horizontal Gene Transfer of the pst Operon in Firmicutes from Oligotrophic Environments
Alejandra Moreno-Letelier, Gabriela Olmedo, Luis E. Eguiarte, Leon Martinez-Castilla, and Valeria Souza
The high affinity phosphate transport system (pst) is crucial for phosphate uptake in oligotrophic environments.... more The high affinity phosphate transport system (pst) is crucial for phosphate uptake in oligotrophic environments. Cuatro Cienegas Basin (CCB) has extremely low P levels and its endemic Bacillus are closely related to oligotrophic marine Firmicutes. Thus, we expected the pst operon of CCB to share the same evolutionary history and protein similarity to marine Firmicutes. Orthologs of the pst operon were searched in 55 genomes of Firmicutes and 13 outgroups. Phylogenetic reconstructions were performed for the pst operon and 14 concatenated housekeeping genes using maximum likelihood methods. Conserved domains and 3D structures of the phosphate-binding protein (PstS) were also analyzed. The pst operon of Firmicutes shows two highly divergent clades with no correlation to the type of habitat nor a phylogenetic congruence, suggesting horizontal gene transfer. Despite sequence divergence, the PstS protein had a similar 3D structure, which could be due to parallel evolution after horizontal gene transfer events.
20 views
Seen by:
