HOX paralogon phylogenomic analysis

Gene families with triplicated or quadruplicated members on human HOX-bearing chromosomes (Hsa2/7/12/17) were identified by scanning the human genome sequence maps available at the Ensembl and UCSC genome browsers (Hubbard et al., 2002). A total of 62 gene families were included in this study: 25 of these families have members on each of the four human HOX-bearing chromosomes, while the remaining 37 have their members on at least three of HOX chromosomes. The closest putative orthologous sequences of the human proteins in other species were obtained using BLASTP in the Ensembl genome browser (Hubbard et al., 2002). To enrich these gene families with sequences from those organisms for which sequence information was not available at Ensembl, a BLASTP (Altschulet al., 1990) search was carried out against the protein database available at the National Center for Biotechnology Information (Johnson et al., 2008) and the Joint Genome Institute (www.jgi.doe.gov/).


The phylogenetic analyses for each gene family were performed using MEGA version 5 (Kumar et al., 2008). Amino acid sequences were aligned using a multiple sequence alignment tool CLUSTAL W with default parameters (Thompson et al., 1994). Phylogenetic trees for each gene family were reconstructed using the neighbor joining (NJ) method (Russo et al., 1996; Saitou and Nei, 1987), the complete deletion option was used to exclude any site which postulated a gap in the sequences. Uncorrected proportion (p) of amino acid difference and poisson corrected (PC) amino acid distance were used as amino acid substitution models. Since both methodologies produce similar results, only the results from NJ tree based on uncorrected p-distance are presented in this database. The authenticity of the resulting tree topologies were confirmed by performing bootstrap method (at 1000 pseudoreplicates) which generated the bootstrap probability for each interior branch in the tree (Felsenstein, 1985). The sequences that were too diverged, disrupting the entire alignment were excluded. To estimate phylogenetic trees using a different reconstruction method, Maximum Likelihood procedure based on the Whelan and Goldman (WAG) model of amino acid replacement was employed (Whelan and Goldman, 2001), using MEGA 5 program.


The species that were selected in the analysis comprised of Homo sapiens (Human), Mus musculus (Mouse), Pan troglodytes (Chimpanzee), Gorilla gorilla (Gorilla), Callithrix jacchus (Marmoset), Pongo abelii (Orangutan), Macaca mulatta (Macaque), Rattus norvegicus (Rat), Oryctolagus cuniculus (Rabbit), Gallus gallus (Chicken), Taeniopygia guttata (Zebra finch), Canis familiaris (Dog), Felis catus (Cat), Bos Taurus (Cow), Equus caballus (Horse), Loxodonta africana (Elephant), Dasypus novemcinctus (Armadillo), Myotis lucifugus (Microbat), Pteropus vampyrus (Megabat), Monodelphis domestica (Opossum), Ornithorhynchus anatinus (Platypus), Anolis carolinensis (Lizard), Pelodiscus sinensis (Chinese softshell turtle), Xenopus tropicalis (Frog), Erinaceus europaeus (Hedgehog), Danio rerio (Zebrafish), Takifugu rubripes (Fugu), Tetraodon nigroviridis (Tetraodon), Gasterosteus aculeatus (Stickleback), Oryzias latipes (Medaka), Ciona intestinalis (Ascidian), Ciona savignyi (Ascidian), Branchiostoma floridae (Amphioxus), Strongylocentrotus purpuratus (Sea urchin), Drosophila melanogaster (Fruit fly), Apis mellifera (Honey bee), Anopheles gambiae (Mosquito), Caenorhabditis elegans (Nematode), Nematostella vectensis (Sea anemone), Hydra magnipapillata (Hydra) and Amphimedon queenslandica (Sponge).



                                             Figure 1(a): Neighbor Joining (N.J) Tree of ABC Family                           Figure 1(b): Maximum Likelihood (M.L) Tree of ABC Family


Useful references:


Abbasi, A.A., (2010b) Unraveling ancient segmental duplication events in human genome by phylogenetic analysis of multigene families residing on HOX-cluster paralogons. Mol. Phylogenet. Evol, 57, 836–848.

Abbasi, A.A., Grzeschik, K.H., (2007) An insight into the phylogenetic history of HOX linked gene families in vertebrates. BMC Evol. Biol, 7, 239.

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., (1990) Basic local alignment search tool. J. Mol. Biol, 215, 403–410.

Ambreen, S., Khalil, F., & Abbasi, A. A. (2014) Integrating large-scale phylogenetic datasets to dissect the ancient evolutionary history of vertebrate genome. Molecular phylogenetics and evolution78, 1-13.

Asrar, Z., Haq, F., & Abbasi, A. A. (2013) Fourfold paralogy regions on human HOX-bearing chromosomes: Role of ancient segmental duplications in the evolution of vertebrate genome. Molecular phylogenetics and evolution66(3), 737-747.

Felsenstein, J., (1985) Confidence limit on phylogenies: an approach using the bootstrap. Evolution, 39, 95–105.

Hubbard, T., Barker, D., Birney, E., Cameron, G., Chen, Y., Clark, L., Cox, T., Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Huminiecki, L., Kasprzyk, A., Lehvaslaiho, H., Lijnzaad, P., Melsopp, C., Mongin, E., Pettett, R., Pocock, M., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Stupka, E., Ureta- Vidal, A., Vastrik, I., Clamp, M.,( 2002) The Ensembl genome database project.Nucleic Acids Res, 30, 38–41.

Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L., (2008) NCBI BLAST: a better web interface. Nucleic Acids Res, 36, W5–W9.

Kumar, S., Nei, M., Dudley, J., Tamura, K., (2008) MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform, 9, 299–306.

Russo, C.A., Takezaki, N., Nei, M., (1996) Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol, 13, 525–536.

Saitou, N., Nei, M., (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol, 4, 406–425.

Thompson, J.D., Higgins, D.G., Gibson, T.J., (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res, 22, 4673–4680.

Whelan, S., Goldman, N., (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol, 18, 691–699.