SAINS MALAYSIANA

Sains Malaysiana 51(11)(2022): 3829-3841

http://doi.org/10.17576/jsm-2022-5111-26

A Comparison of Efficiency of Test Statistics for Detecting Outliers in Normal Population

(Suatu Perbandingan Kecekapan Ujian Statistik untuk Mengesan Maklumat Tepian dalam Populasi Normal)

KULLAPHAT PROMTEP^1,2 PHONTITA THIUTHAD^1,2,* & NATCHITA INTARAMO²

¹Statistics and Applications Research Unit, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, 90110 Thailand

²Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, 90110 Thailand

Received: 20 October 2021/Accepted: 24 June 2022

Abstract

The objective of this research was to compare the efficiency among the test statistics which are used to detect outliers by testing hypothesis methods. The test statistics considered were Dixon’s test, Ferguson’s test, Grubbs’ test, T_w-test, and Tietjen-Moore’s test. The outliers were divided, by how far they are, into two groups: mild and extreme outliers. The efficiency of the test statistics was measured by the probability of type I error and the power of the test. The results showed that Tietjen-Moore’s test can control the probability of type I error according to Cochran and Bradley criteria in every situation. T_w-test has highest sensitivity in detecting one outlier when the sample size is small or moderate but, if the sample size is large, Grubbs’ test performs better. In the case of detecting one extreme outlier, the power of four tests tend to increase as the sample size increases at the significance level 0.01. Given that k outliers are detected, Tietjen-Moore’s test provides higher power than T_w-test when k equals 10% of sample size when the outliers are both mild and extreme, contrary to the case when k make up for 20%.

Keywords: Detection of outliers; normal distribution; power of the test; Tietjen-Moore’s test; type I error

Abstrak

Objektif kajian ini adalah untuk membandingkan kecekapan antara statistik ujian yang digunakan untuk mengesan maklumat tepian dengan menguji kaedah hipotesis. Statistik ujian yang dipertimbangkan ialah ujian Dixon, ujian Ferguson, ujian Grubbs, ujian T_w dan ujian Tietjen-Moore. Maklumat tepian dibahagikan mengikut jarak kepada dua kumpulan: maklumat tepian ringan dan maklumat tepian ekstrem. Kecekapan ujian statistik diukur dengan kebarangkalian ralat jenis I dan kuasa ujian. Keputusan menunjukkan bahawa ujian Tietjen-Moore boleh mengawal kebarangkalian ralat jenis I mengikut kriteria Cochran dan Bradley dalam setiap situasi. Ujian T_w mempunyai kepekaan tertinggi dalam mengesan satu maklumat tepian apabila saiz sampel kecil atau sederhana tetapi jika saiz sampel besar, ujian Grubbs menunjukkan prestasi yang lebih baik. Dalam kes mengesan satu maklumat tepian melampau, kuasa empat ujian cenderung meningkat apabila saiz sampel meningkat pada tahap keertian 0.01. Memandangkan k maklumat tepian dikesan, ujian Tietjen-Moore memberikan kuasa yang lebih tinggi daripada ujian T_w apabila k bersamaan dengan 10% saiz sampel apabila maklumat tepian adalah ringan dan melampau, bertentangan dengan kes apabila k membentuk 20%.

Kata kunci: Kuasa ujian; pengesan maklumat tepian; ralat jenis I; taburan normal; ujian Tietjen-Moore

REFERENCES

Barnett, V. & Lewis, T. 1984. Outliers in Statistical Data. New York: John Wiley & Sons.

Bradley, J.V. 1978. “Robustness?” British Journal of Mathematical and Statistical Psychology 31: 144-152.

Cochran, W.G. 1954. Some methods for strengthening the common χ² tests. Biometrics 10: 417-451.

Dixon, W.J. 1951. Ratios involving extreme values. The Annals of Mathematical Statistics 22: 68-78.

Dixon, W.J. 1953. Processing data for outliers. Biometrics 9: 74-89.

Efstathiou, C.E. 2006. Estimation of type I error probability from experimental Dixon’s q parameter on testing for outliers within small size data sets. Talanta 69: 1068-1071.

Ferguson, T.S. 1961. On the rejection of outliers. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press. pp. 253-287.

Grubbs, F.E. 1950. Sample criteria for testing outlying observations. The Annals of Mathematical Statistics 21: 27-58.

Grubbs, F.E. 1969. Procedures for detecting outlying observations in samples. Technometrics 11: 1-21.

Hawkins, D.M. 1980. Identification of Outliers. London: Chapman and Hall.

Jareankam, W. 2020. A detection of outliers in random sample from normally distributed population using coefficient of skewness. Burapha Science Journal 25: 236-245.

Jareankam, W. 2013. A detection of outliers in random sample from normal population. Thesis. National Institute of Development Administration (Unpublished).

Patchayaluck, N. 2013. Estimation of probability of type I error and power of test for test statistics an outlier. Thesis, Silpakorn University (Unpublished).

Rahman, S.K., Sathik, M.M. & Kannan, K.S. 2014. A novel approach for univariate outlier detection. International Journal of Scientific & Engineering Research 5: 1594-1599.

Rattanaloetnusorn, S. 1991. A comparative study on some procedures for detecting outliers in linear regression analysis. Thesis, Chulalongkorn University (Unpublished).

Rosner, B. 1975. On the detection of many outliers. Technometrics 17: 221-227.

Tietjen, G.L. & Moore, R.H. 1972. Some grubbs-type statistics for the detection of several outliers. Technometrics 14: 583-597.

Verma, S.P. & Quiroz-Ruiz, A. 2006. Critical values for six Dixon tests for outliers in normal samples up to sizes 100, and applications in science and engineering. Revistamexicana de Ciencias Geológicas 23: 133-161.

^*Corresponding author; email: phontita.t@psu.ac.th

content