The absolute nature of evolutionary events that had led to creation of ancient (>450) paralogy regions in the vertebrate genome, is extremely difficult to track through inter-genomic and intra-genomic map comparison approaches because such ancient events experienced multiple chromosomal breakages and rearrangement events that led to the alteration of karyotype and disruption of gene order on chromosomes. A more convincing way to determine the mechanism of origin of vertebrate ancient paralogons is phylogenetic analysis of multigene families (Abbasi, 2010; Abbasi and Grzeschik, 2007; Asrar et al., 2013; Hughes, 1998; Hughes et al., 2001a). This approach effectively apprehends the precise nature of anciently duplicated genomic regions in two ways: Firstly, by estimating relative timing of duplication events occurring prior or after a speciation event. This type of relative dating can provide a robust picture of extent of duplication events within particular time window (Van de Peer, 2004). Secondly, the evolutionary origin of paralogons can be examined by coupling the information from the global physical organization of gene families comprising of paralogons with their relevant tree topologies.
The phylogenetic analyses for 193 human triplicated/quadruplicated gene families was performed. Amino acid sequences were aligned using a multiple alignment tool CLUSTAL-W with default parameters (Thompson et al., 1994). Phylogenetic trees for each gene family were reconstructed using the neighbor joining (NJ) method (Russo et al., 1996; Saitou and Nei, 1987), the complete deletion option was used to exclude any site which postulated a gap in the sequences. Uncorrected proportion (p) of amino acid difference and possion corrected (PC) amino acid distance were used as amino acid substitution models. The authenticity of the resulting tree topologies was confirmed by performing bootstrap method (at 1000 pseudoreplicates) which generated the bootstrap probability for each interior branch in the tree (Felsenstein, 1985). The sequences that were too diverged, disrupting the entire alignment were excluded.
To estimate phylogenetic trees using a different reconstruction method, Maximum Likelihood procedure based on the Whelan and Goldman (WAG) model of amino acid replacement was employed (Whelan and Goldman, 2001), using MEGA 5 program. The gene duplication events with relevance to major taxa of organisms were estimated by the branching order of each gene family within the phylogenetic tree.
Figure 1: Phylogenetic tree of Fibroblast Growth Factor Receptor (FGFR) family.
Useful references: