Gene families with triplicated or quadruplicated members on human HOX-bearing chromosomes (Hsa2/7/12/17) were identified by scanning the human genome sequence maps available at the Ensembl and UCSC genome browsers (Hubbard et al., 2002). A total of 62 gene families were included in this study: 25 of these families have members on each of the four human HOX-bearing chromosomes, while the remaining 37 have their members on at least three of HOX chromosomes. The closest putative orthologous sequences of the human proteins in other species were obtained using BLASTP in the Ensembl genome browser (Hubbard et al., 2002). To enrich these gene families with sequences from those organisms for which sequence information was not available at Ensembl, a BLASTP (Altschulet al., 1990) search was carried out against the protein database available at the National Center for Biotechnology Information (Johnson et al., 2008) and the Joint Genome Institute (www.jgi.doe.gov/).
The phylogenetic analyses for each gene family were performed using MEGA version 5 (Kumar et al., 2008). Amino acid sequences were aligned using a multiple sequence alignment tool CLUSTAL W with default parameters (Thompson et al., 1994). Phylogenetic trees for each gene family were reconstructed using the neighbor joining (NJ) method (Russo et al., 1996; Saitou and Nei, 1987), the complete deletion option was used to exclude any site which postulated a gap in the sequences. Uncorrected proportion (p) of amino acid difference and poisson corrected (PC) amino acid distance were used as amino acid substitution models. Since both methodologies produce similar results, only the results from NJ tree based on uncorrected p-distance are presented in this database. The authenticity of the resulting tree topologies were confirmed by performing bootstrap method (at 1000 pseudoreplicates) which generated the bootstrap probability for each interior branch in the tree (Felsenstein, 1985). The sequences that were too diverged, disrupting the entire alignment were excluded. To estimate phylogenetic trees using a different reconstruction method, Maximum Likelihood procedure based on the Whelan and Goldman (WAG) model of amino acid replacement was employed (Whelan and Goldman, 2001), using MEGA 5 program.
The species that were selected in the analysis comprised of Homo sapiens (Human), Mus musculus (Mouse), Pan troglodytes (Chimpanzee), Gorilla gorilla (Gorilla), Callithrix jacchus (Marmoset), Pongo abelii (Orangutan), Macaca mulatta (Macaque), Rattus norvegicus (Rat), Oryctolagus cuniculus (Rabbit), Gallus gallus (Chicken), Taeniopygia guttata (Zebra finch), Canis familiaris (Dog), Felis catus (Cat), Bos Taurus (Cow), Equus caballus (Horse), Loxodonta africana (Elephant), Dasypus novemcinctus (Armadillo), Myotis lucifugus (Microbat), Pteropus vampyrus (Megabat), Monodelphis domestica (Opossum), Ornithorhynchus anatinus (Platypus), Anolis carolinensis (Lizard), Pelodiscus sinensis (Chinese softshell turtle), Xenopus tropicalis (Frog), Erinaceus europaeus (Hedgehog), Danio rerio (Zebrafish), Takifugu rubripes (Fugu), Tetraodon nigroviridis (Tetraodon), Gasterosteus aculeatus (Stickleback), Oryzias latipes (Medaka), Ciona intestinalis (Ascidian), Ciona savignyi (Ascidian), Branchiostoma floridae (Amphioxus), Strongylocentrotus purpuratus (Sea urchin), Drosophila melanogaster (Fruit fly), Apis mellifera (Honey bee), Anopheles gambiae (Mosquito), Caenorhabditis elegans (Nematode), Nematostella vectensis (Sea anemone), Hydra magnipapillata (Hydra) and Amphimedon queenslandica (Sponge).
Figure 1(a): Neighbor Joining (N.J) Tree of ABC Family Figure 1(b): Maximum Likelihood (M.L) Tree of ABC Family
Useful references: