Sains Malaysiana 44(11)(2015): 1643-1651

 

Performance Comparison between Bootstrap and Multiscale Bootstrap for Assessing Phylogenetic Tree for RNA polymerase

(Perbandingan Prestasi antara Butstrap dan Multiskala Butstrap untuk Menilai Pohon Filogenetik bagi RNA polymerase)

 

SAFINAH SHARUDDIN* & NORA MUDA

 

School of Mathematical Sciences, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Darul Ehsan, Malaysia

 

Received: 19 January 2014/Accepted: 15 June 2015

 

ABSTRACT

Phylogenetic inference refers to the reconstruction of evolutionary relationships among various species that is usually presented in the form of a tree. This study constructs the phylogenetic tree by using a novel distance-based method known as Modified one step M-estimator (MOM) method. The branches of the phylogenetic tree constructed were then evaluated to see their reliability. The performance of the reliability was then compared between the p-value of multiscale bootstrap (AU value) and bootstrap p-value (BP value). The aim of this study was to compare the performance between the AU value and BP value for assessing phylogenetic tree of RNA polymerase. The results have shown that multiscale bootstrap analysis can detect high sampling errors but not in bootstrap analysis. To overcome this problem, the multiscale bootstrap analysis has reduced the sampling error by increasing the number of replications. The clusters were indicated as significant if AU values or BP values were 95% or higher. From the analysis, the results showed that the BP and AU values differ at 11th and 15th branch of the phylogenetic tree. The BP values at both branches were 72 and 85%, respectively, thereby making the cluster not significant but by looking at the AU values, the two branches were more than 95% and the clusters were significant. This was due to the biasness in calculation of the probability of bootstrap analysis, therefore, the multiscale bootstrap analysis has improved the calculation of the probability value compared to the bootstrap analysis.

 

Keywords: Distance-based method; median absolute deviation (MADn); modified one-step M-estimator (MOM); phylogenetic inference

 

ABSTRAK

Pentaabiran filogenetik merujuk kepada pembinaan semula hubungan evolusi dalam kalangan pelbagai spesies yang biasanya dibentangkan dalam bentuk pohon. Dalam kajian ini, pohon filogenetik dibina menggunakan kaedah novel berdasarkan jarak yang dikenali sebagai kaedah Penganggar-M satu langkah terubah suai (MOM). Seterusnya penilaian ke atas pembinaan pohon filogenetik yang dibangunkan akan dinilai bagi menentukan kebolehpercayaan terhadap cabang yang terbentuk. Perbandingan cabang-cabang pohon filogenetik yang dibentuk dinilai dengan melihat nilai-p bagi kaedah multiskala butstrap (nilai AU) dan dibandingkan dengan nilai-p bagi kaedah butstrap (nilai BP). Tujuan utama kajian ini adalah untuk membandingkan prestasi antara nilai AU dan BP bagi menilai pohon filogenetik RNA polimerase. Keputusan mendapati bahawa analisis multiskala butstrap dapat mengesan ralat sampel yang tinggi berbanding analisis butstrap. Analisis multiskala butstrap mengurangkan ralat sampel ini dengan menambahkan bilangan replikasi. Kelompok dikatakan bererti sekiranya tahap keyakinan menunjukkan peratusan melebihi 95%. Hasil mendapati nilai BP dan AU berbeza pada cabang ke-11 dan ke-15 dengan nilai BP masing-masing adalah 72% dan 85% seterusnya menjadikan kelompok itu tidak bererti tetapi sebenarnya bererti dengan nilai AU iaitu kedua-dua cabang melebihi 95%. Ini adalah disebabkan oleh pengiraan nilai kebarangkalian bagi analisis butstrap adalah pincang. Oleh itu, analisis multiskala telah memperbaiki pengiraan nilai kebarangkalian bagi analisis butstrap.

Kata kunci: Kaedah berdasarkan jarak; median sisihan mutlak (MADn); penganggar-M satu langkah terubahsuai (MOM); pentaabiran filogenetik

 

REFERENCES

 

Bremer, Kr. 1994. Branch support and tree stability. Cladistics 10: 295-304.

Dayhoff, M.O. 1978. Survey of new data and computer methods of analysis. Atlas of Protein Sequence and Structure 5(3): 9.

Efron, B. 1979. Bootstrap methods: Another look at the jackknife. Ann. Stat. 7: 1-26.

Efron, B., Halloran, E. & Holmes, S. 1996. Bootstrap confidence levels for phylogenetic trees. Presented at Proc. Natl. Acad. Sci. U.S.A.

Farris, J.S., Albert, V.A., Källersjö, M., Lipscomb, D. & Kluge, A.G. 1996. Parsimony jackkniffing outperforms neighbor-joining. Cladistics 12: 99-124.

Felsenstein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783-791.

Felsenstein, J. & Kishino, H. 1993. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 42: 193-200.

Hillis, D.M. & Bull, J.J. 1993. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. 42: 182-192.

Li, W.H. & Zharkikh, A. 1994. What is the bootstrap technique? Syst. Biol. 43: 424-430.

Makarenkov, V., Boc, A., Xie, J., Peres-Neto, P., Lapointe, F-J. & Legendre, P. 2010. Weighted bootstrapping: A correction method for assessing the robustness of phylogenetic trees. BMC. Evol. Biol. 10: 250.

Michener, C.D. & Sokal, R.R. 1957. A quantitative approach to a problem in classification. Evolution 11: 130-162.

Muda, N., Othman, A.R., Najimudin, N. & Hussein, Z.A.M. 2009. The phylogenetic tree of RNA polymerase constructed using MOM method. International Conference of Soft Computing and Pattern Recognition, Malacca. pp. 484-489.

Schliep, K.P. 2010. Phangorn: Phylogenetic analysis in R. Bioinformatics 27: 592-593.

Shimodaira, H. 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51: 492-508.

Sokal, R.R. & Sneath, P.H.A. 1963. Principles of Numerical Taxonomy. San Francisco, CA: W.H. Freeman.

Suzuki, R. & Shimodaira, H. 2006. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22: 1540-1542.

Wilkinson, M. 1994. The permutation method and character compatibility. Syst. Biol. 43: 274-277.

Zharkikh, A. & Li, W.H. 1992. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Mol. Biol. Evol. 9: 1119-1147.

 

*Corresponding author; email: safinahukm@gmail.com

 

 

 

previous