Sains Malaysiana 50(6)(2021): 1787-1798

http://doi.org/10.17576/jsm-2021-5006-24

 

Comparative Study of Clustering-Based Outliers Detection Methods in Circular-Circular Regression Model

(Kajian Perbandingan Kaedah Penetapan Titik Terpencil Berasaskan Kelompok dalam Model Pendaftaran Lingkaran)

 

SITI ZANARIAH SATARI1*, NUR FARAIDAH MUHAMMAD D1*, YONG ZULINA ZUBAIRI2 & ABDUL GHAPOR HUSSIN3

 

1Centre for Mathematical Sciences College of Computing & Applied Sciences, Universiti Malaysia Pahang, 26300 Kuantan, Pahang Darul Makmur, Malaysia

 

2Centre for Foundation Studies in Sciences, University of Malaya, 50603 Kuala Lumpur, Federal Territory, Malaysia

 

3Faculty of Defence Sciences and Technology, National Defence University of Malaysia, Sungai Besi Camp, 57000 Kuala Lumpur, Federal Territory, Malaysia

 

Received: 7 May 2019/Accepted: 14 October 2020

 

ABSTRACT

This paper is a comparative study of several algorithms for detecting multiple outliers in circular-circular regression model based on the clustering algorithms. Three measures of similarity based on the circular distance were used to obtain a cluster tree using the agglomerative hierarchical methods. A stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height was used as the cutoff point and classifier to the cluster group that exceeded the stopping rule as potential outliers. The performances of the algorithms have been demonstrated using the simulation studies that consider several outlier scenarios with a certain degree of contamination. Application to real data using wind data and a simulated data set are given for illustrative purposes. Thus, it has been found that Satari’s algorithm (S-SL algorithm) performs well for any values of sample size n and error concentration parameter. The algorithms are good in identifying outliers which are not limited to one or few outliers only, but the presence of multiple outliers at one time.

 

Keywords: Circular distance; circular-circular regression model; clustering; outliers; stopping rule

 

ABSTRAK

Kertas ini membincangkan kajian perbandingan beberapa algoritma yang mengesan titik terpencil berganda dalam model regresi bulatan berdasarkan algoritma berkelompok. Tiga ukuran persamaan berasaskan jarak bulatan telah digunakan bagi mendapatkan pokok kelompok menggunakan algoritma aglomeratif hierarki. Satu nilai potongan untuk pokok kelompok berdasarkan min terarah dan sisihan piawai bulatan bagi ketinggian pokok tersebut telah digunakan bagi mengkelaskan kumpulan kelompok yang melebihi titik potongan ini sebagai titik terpencil. Prestasi algoritma ini telah diuji dalam kajian simulasi yang mengambil kira beberapa senario titik terpencil dengan tahap berbeza. Untuk tujuan illustrasi, satu aplikasi data sebenar menggunakan data angin dan satu set data simulasi telah diberikan. Kami mendapati algoritma Satari (Algoritma S-SL) adalah baik untuk sebarang nilai saiz sampel dan parameter menumpu. Algoritma tersebut adalah baik dalam mengenal pasti titik terpencil atau berganda pada satu masa.

 

Kata kunci: Algoritma berkelompok; jarak bulatan; model regresi bulatan; nilai potongan; titik terpencil

 

REFERENCES

Abuzaid, A.H. 2010. Some problems of outliers in circular data. University of Malaya. Ph.D. Thesis (Unpublished).

Abuzaid, A.H., Hussin, A.G. & Mohamed, I.B. 2013. Detection of outliers in simple circular regression models using the mean circular error statistic. Journal of Statistical Computation and Simulation 83(2): 269-277.

Abuzaid, A.H., Mohamed, I.B. & Hussin, A.G. 2012a. Boxplot for circular variables. Computational Statistics 27(3): 381-392.

Abuzaid, A.H., Hussin, A.G., Rambli, A. & Mohamed, I.B. 2012b. Statistics for a new test of discordance in circular data. Communications in Statistics-Simulation and Computation 41(10): 1882-1890.

Abuzaid, A.H., Hussin, A.G., Rambli, A. & Mohamed, I.B. 2011. COVRATIO statistic for simple circular-circular regression model. Chiang Mai Journal of Science 38(3): 321-330.

Abuzaid, A.H., Hussin, A.G. & Mohamed, I.B. 2009. Identifying single outlier in linear circular-circular regression model based on circular distance. Journal of Applied Probability & Statistics 3(1): 107-117.

Adnan, R. & Mohamad, M.N. 2003. Multiple outliers detection procedures in linear regression. Matematika 19(1): 29-45.

Alkasadi, N.A., Ibrahim, S., Ramli, M.F. & Yusoff, M.I. 2016. A comparative study of outlier detection procedures in multiple circular regression. In AIP Conference Proceedings 1775(1): 1-7.

Blashfield, R.K. & Morey, L.C. 1980. A comparison of four clustering methods using MMPI Monte Carlo data. Applied Psychological Measurement 4(1): 57-64.

Di, N.F.M. & Satari, S.Z. 2017. The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model. In AIP Conference Proceedings 1842(1): 1-13.

Di, N.F.M., Satari, S.Z. & Zakaria, R. 2017. Detection of different outlier scenarios in circular regression model using single-linkage method. Journal of Physics: Conference Series 890(1): 1-5.

Caires, S. & Wyatt, L.R. 2003. A linear functional relationship model for circular data with an application to the assessment of ocean wave measurements. Journal of Agricultural, Biological, and Environmental Statistics 8(2): 153-169.

Chang-Chien, S.J., Hung, W.L. & Yang, M.S. 2012. On mean shift-based clustering for circular data. Soft Computing 16(6): 1043-1060.

Downs, T.D. & Mardia, K.V. 2002. Circular regression. Biometrika 89(3): 683-698.

Fisher, N.I. 1995. Statistical Analysis of Circular Data. Cambridge: Cambridge University Press.

Gan, G., Ma, C. & Wu, J. 2007. Data Clustering: Theory, Algorithms, and Applications. United States of America: SIAM.

Hartigan, J.A. 1975. Clustering Algorithm. New York: John Wiley & Sons Inc.

Hussin, A.G. & Abuzaid, A.H. 2012. Detection of outliers in functional relationship model for circular variables via complex form. Pakistan Journal of Statistics 28(2): 205-216.

Hussin, A.G., Abuzaid, A.H., Mohamed, I. & Rambli, A. 2013. Detection of outliers in the complex linear regression model. Sains Malaysiana 42(6): 869-874.

Hussin, A.G., Abuzaid, A., Zulkifli, F. & Mohamed, I. 2010. Asymptotic covariance and detection of influential observations in a linear relationship model for circular data with application to the measurements of wind directions. ScienceAsia 36(2010): 249-253.

Hussin, A.G., Fieller, N.R. & Stillman, E.C. 2004. Linear regression model for circular variables with application to directional data. Journal of Applied Science and Technology 9(1): 1-6.

Ibrahim, S. 2013. Some outlier problems in a circular-circular regression model. University of Malaya. Ph.D. Thesis (Unpublished).

Ibrahim, S., Rambli, A., Hussin, A.G. & Mohamed, I. 2013. Outlier detection in a circular-circular regression model using COVRATIO statistic. Communications in Statistics-Simulation and Computation 42(10): 2270-2280.

Jammalamadaka, S.R. & Sengupta, A. 2001. Topics In Circular Statistics. Singapore: World Scientific.

Jammalamadaka, S.R. & Sarma, Y.R. 1993. Circular regression. In Statistical Sciences and Data Analysis, edited by Matusita, K. Puri, M.L. & Hayakawa, T. Utrecht: VSP. pp. 109-128.

Milligan, G.W. & Cooper, M.C. 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2): 159-179.

Mojena, R. 1977. Hierarchical grouping methods and stopping rules: An evaluation. The Computer Journal 20(4): 359-363.

Rambli, A. 2011. Outlier detection in circular data and circular-circular-circular regression model. University of Malaya. M.Sc. Thesis (Unpublished).

Rambli, A., Abuzaid, A.H., Mohamed, I.B. & Hussin, A.G. 2016. Procedure for detecting outliers in a circular-circular regression model. PloS ONE 11(4): e0153074.

Rambli, A., Yunus, R.M., Mohamed, I. & Hussin, A.G. 2015. Outlier detection in a circular-circular regression model. Sains Malaysiana 44(7): 1027-1032.

Rambli, A., Mohamed, I., Abuzaid, A.H. & Hussin, A.G. 2010. Identification of influential observations in circular-circular regression model. In Proceedings of the Regional Conference on Statistical Sciences (RCSS’10). pp. 195-203.

Satari, S.Z., Di, N.F.M. & Zakaria, R. 2017. The multiple outliers detection using agglomerative hierarchical methods in circular regression model. Journal of Physics: Conference Series 890(1): 1-5.

Satari, S.Z. 2015. Parameter estimation and outlier detection for some types of circular model. University of Malaya. Ph.D. Thesis (Unpublished).

Sebert, D.M., Montgomery, D.C. & Rollier, D.A. 1998. A clustering algorithm for identifying multiple outliers in linear regression. Computational Statistics and Data Analysis 27(4): 461-484.

 

*Corresponding author; email: zanariah@ump.edu.my

 

 

   

previous