Phylogenomic Analysis

The absolute nature of evolutionary events that had led to creation of ancient (>450) paralogy regions in the vertebrate genome, is extremely difficult to track through inter-genomic and intra-genomic map comparison approaches because such ancient events experienced multiple chromosomal breakages and rearrangement events that led to the alteration of karyotype and disruption of gene order on chromosomes. A more convincing way to determine the mechanism of origin of vertebrate ancient paralogons is phylogenetic analysis of multigene families (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013; Hughes, 1998; Hughes et al., 2001a). This approach effectively apprehends the precise nature of anciently duplicated genomic regions in two ways: Firstly, by estimating relative timing of duplication events occurring prior or after a speciation event. This type of relative dating can provide a robust picture of extent of duplication events within particular time window (Van de Peer, 2004). Secondly, the evolutionary origin of paralogons can be examined by coupling the information from the global physical organization of gene families comprising of paralogons with their relevant tree topologies.


The phylogenetic analyses for 193 human triplicated/quadruplicated gene families was performed. Amino acid sequences were aligned using a multiple alignment tool CLUSTAL-W with default parameters (Thompson et al., 1994). Phylogenetic trees for each gene family were reconstructed using the neighbor joining (NJ) method (Russo et al., 1996; Saitou and Nei, 1987), the complete deletion option was used to exclude any site which postulated a gap in the sequences. Uncorrected proportion (p) of amino acid difference and possion corrected (PC) amino acid distance were used as amino acid substitution models. The authenticity of the resulting tree topologies was confirmed by performing bootstrap method (at 1000 pseudoreplicates) which generated the bootstrap probability for each interior branch in the tree (Felsenstein, 1985). The sequences that were too diverged, disrupting the entire alignment were excluded.


To estimate phylogenetic trees using a different reconstruction method, Maximum Likelihood procedure based on the Whelan and Goldman (WAG) model of amino acid replacement was employed (Whelan and Goldman, 2001), using MEGA 5 program. The gene duplication events with relevance to major taxa of organisms were estimated by the branching order of each gene family within the phylogenetic tree.



Figure 1: Phylogenetic tree of Fibroblast Growth Factor Receptor (FGFR) family.


Useful references:


Abbasi, A. A., (2010a) Piecemeal or big bangs: correlating the vertebrate evolution with proposed models of gene expansion events. Nat Rev Genet, 11, 166.

Abbasi, A. A., Grzeschik, K. H., (2007) An insight into the phylogenetic history of HOX linked gene families in vertebrates. BMC Evol Biol, 7, 239.

Asrar Z et al.,(2013) Fourfold paralogy regions on human HOX-bearing chromosomes: role of ancient segmental duplications in the evolution of vertebrate genome. Mol Phylogenet Evol , 66(3), 737-47.

Hughes, A. L., (1998) Phylogenetic tests of the hypothesis of block duplication of homologous genes on human chromosomes 6, 9, and 1. Mol Biol Evol, 15, 854-70.

Hughes, A. L., da Silva, J., Friedman, R., (2001) Ancient genome duplications did not structure the human Hox-bearing chromosomes. Genome Res, 11, 771-80.

Clamp, M., Andrews, D., Barker, D., Bevan, P., Cameron, G., Chen, Y., Clark, L., Cox, T.,Cuff, J., Curwen, V., Down, T., Durbin, R., Eyras, E., Gilbert, J., Hammond, M., Hubbard, T., Kasprzyk, A., Keefe, D., Lehvaslaiho, H., Iyer, V., Melsopp, C., Mongin, E., Pettett, R., Potter, S., Rust, A., Schmidt, E., Searle, S., Slater, G., Smith, Birney, E., 2003. Ensem Felsenstein J. (1985) Confidence-Limits on Phylogenies - an Approach Using the Bootstrap. Evolution, 39, 783-791.

Tamura K, Peterson D, Peterson N, Stecher G, Nei M, and Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution, 28, 2731-2739.

Russo, C.A., Takezaki, N., Nei, M., (1996) Efficiencies of different genes and different tree-building methods in recovering a known vertebrate phylogeny. Mol. Biol. Evol, 13, 525–536.

Saitou, N., Nei, M., (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol, 4, 406–425.


Thompson JD, Higgins DG, Gibson TJ. (1994) Clustal-W - Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice. Nucleic Acids Research, 22, 4673-4680.

Van de Peer, Y., 2004. Computational approaches to unveiling ancient genome duplications. Nat. Rev. Genet, 5, 752–763.

Whelan, S., Goldman, N., (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.