HSA 1/2/8/20 paralogon phylogenomic analysis

The absolute nature of evolutionary events that had led to creation of ancient paralogy regions in the vertebrate genome is extremely difficult to track through inter-genomic and intra-genomic map comparison approaches because such ancient events experienced multiple chromosomal breakages and rearrangements that led to the alteration of karyotype and disruption of gene order on chromosomes. A more convincing way to determine the mechanism of origin of vertebrate ancient paralogons is phylogenetic analysis of multigene families (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013; Hughes, 1998; Hughes et al., 2001a). This approach effectively apprehends the precise nature of anciently duplicated genomic regions in two ways: Firstly, by estimating relative timing of duplication events occurring prior or after a speciation event. This type of relative dating can provide a robust picture of extent of duplication events within particular time window (Van de Peer, 2004). Secondly, the evolutionary origin of paralogons can be examined by coupling the information from the global physical organization of gene families comprising of paralogons with their relevant tree topologies.


In order to test the assumptions of the tetralogy hypothesis, genes with three fold or four fold representation in the proposed paralogy regions on human chromosomes 1(p36– p34)/9q34, 2p24/6(q21–q23)/8p13, 8(q12–q24)/18p11 and 20 (q11–q13) were identified by scanning the human genome sequence maps available at the Ensembl and UCSC genome browsers (Flicek et al., 2011; Fujita et al., 2011). A total of 11 families were identified: seven with fourfold representation and four with threefold representation in these paralogy blocks. The closest putative orthologs of the human proteins in other species were obtained using BLASTP in the Ensembl genome browser (Altschul et al., 1990). To enrich these gene families with sequences from those organisms for which sequence information was not available at Ensembl, a BLASTP search was carried out against the protein database available at the National Center for Biotechnology Information (Johnson et al., 2008) and the Joint Genome Institute [http://www.jgi.doe.gov/]. The phylogenetic tree for each gene family was reconstructed using the neighbor-joining (NJ) method (Russo et al., 1996; Saitou and Nei, 1987). The complete deletion option was used to exclude any site that postulated a gap in the sequences. Uncorrected proportion (p) of amino acid differences were used as amino acid substitution models. The reliability of the resulting tree topology was tested by the bootstrap method (1000 pseudoreplicates), which generated the bootstrap probability for each interior branch in the tree (Felsenstein,1985). For each gene family, a maximum likelihood tree was constructed using the Whelan and Goldman (WAG) model of amino acid replacement (Whelan and Goldman, 2001).



                              Figure 1(a): Neighbor Joining (N.J) Tree of HCK Family                               Figure 1(b): Maximum Likelihood (M.L) Tree of HCK Family


References:



Abbasi, A. A. (2010) Unraveling ancient segmental duplication events in human genome by phylogenetic analysis of multigene families residing on HOX-cluster paralogons. Molecular phylogenetics and evolution, 57(2), 836-848.

Abbasi AA, Grzeschik KH (2007) An insight into the phylogenetic history of HOX linked gene families in vertebrates. BioMed Central Evolutionary Biology, 7, 239.

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., (1990) Basic local alignment search tool. J. Mol. Biol, 215, 403–410.

Asrar, Z., Haq, F., & Abbasi, A. A. (2013) Fourfold paralogy regions on human HOX-bearing chromosomes: Role of ancient segmental duplications in the evolution of vertebrate genome. Molecular phylogenetics and evolution66(3), 737-747.

Felsenstein, J., (1985) Confidence limit on phylogenies: an approach using the bootstrap. Evolution, 39, 95–105.

Flicek, P., Amode, M.R., Barrell, D., Beal, K., Brent, S., Chen, Y., Clapham, P., Coates, G., Fairley, S., Fitzgerald, S., Gordon, L., Hendrix, M., Hourlier, T., Johnson, N., Kahari, A., Keefe, D., Keenan, S., Kinsella, R., Kokocinski, F., Kulesha, E., Larsson, P., Longden, I., McLaren, W., Overduin, B., Pritchard, B., Riat, H.S., Rios, D., Ritchie, G.R., Ruffier, M., Schuster, M., Sobral, D., Spudich, G., Tang, Y.A., Trevanion, S., Vandrovcova, J., Vilella, A.J., White, S., Wilder, S.P., Zadissa, A., Zamora, J., Aken, B.L., Birney, E., Cunningham, F., Dunham, I., Durbin, R., Fernandez-Suarez, X.M.,Herrero, Hubbard, T.J., Parker, A., Proctor, G., Vogel, J., Searle, S.M., (2011) Ensembl 2011. Nucleic Acids Res, 39, D800–D806.

Fujita, P.A., Rhead, B., Zweig, A.S., Hinrichs, A.S., Karolchik, D., Cline, M.S., Goldman, M., Barber, G.P., Clawson, H., Coelho, A., Diekhans, M., Dreszer, T.R., Giardine, B.M., Harte, R.A., Hillman-Jackson, J., Hsu, F., Kirkup, V., Kuhn, R.M., Learned, K., Li, C.H., Meyer, L.R., Pohl, A., Raney, B.J., Rosenbloom, K.R., Smith, K.E., Haussler, D., Kent, W.J., (2011) The UCSC Genome Browser database: update 2011. Nucleic Acids Res, 39, D876–D882.


Hughes, A. L. (1998) Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. Molecular biology and evolution15(7), 854-870.

Hughes, A. L., da Silva, J., & Friedman, R. (2001) Ancient genome duplications did not structure the human Hox-bearing chromosomes. Genome research11(5), 771-780.


Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., Madden, T.L., (2008) NCBI BLAST: a better web interface. Nucleic Acids Res, 36, W5–W9.

Russo, C.A., Takezaki, N., Nei, M., (1996) Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol, 13, 525–536.

Saitou, N., Nei, M., (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol, 4, 406–425.

Van de Peer, Y., (2004) Computational approaches to unveiling ancient genome duplications. Nat. Rev. Genet, 5, 752–763.

Whelan, S., Goldman, N., (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol, 18, 691–699.