Article Info

Cluster Analysis for Identifying Obesity Subgroups in Health and Nutitional Status Survey Data

Usman Khalil, Owais Ahmed Malik, Daphne Teck Ching Lai, Ong Sok King


This study presents the discovery of meaningful patterns (groups) from the obese samples of health and nutritional survey data by applying various clustering techniques. Due to the mixed nature of the data (qualitative and quantitative variables) in the data set, the best-suited clustering techniques with appropriate dissimilarity metrics were chosen to interpret the meaningful results. The relationships between obesity and the lifestyle affecting factors like demography, socio-economic status, physical activity, and dietary behavior were assessed using four cluster techniques namely Two-Step clustering, Partition Around Medoids (PAM), Agglomerative Hierarchical clustering and, Kohonen Self Organizing Maps (SOMs). The solutions generated by these techniques were analyzed and validated by the help of cluster validity (CV) indices and later on their associations were determined with the obesity classes to discover the pattern from the obese sample. Two-Step clustering and hierarchical clustering outperformed the other applied techniques in identifying the subgroups based on the underlying hidden patterns in the data. Based on the CV indices values and the association analysis (obesity factor with the cluster solutions), two subgroups were generated and profiles of these groups have been reported. The first group belonged to the middle-aged individuals who seem to take care of their lifestyle while the other group belonged to young-aged individuals who in contrast to the first group presented a careless lifestyle factor (i.e., physical activity and dietary behavior). The salient features of these subgroups have been reported and can be proposed for the betterment in the health care industry. The research helped in identifying the interesting subsets/groups within survey data demonstrating similar characteristics and health status (i.e., prevalence of obesity with respect to lifestyle factors like physical activity, dietary behavior etc.) which will help to suggest appropriate measures/steps to be taken by the concerned departments to counter them and prevent in the population.


NHANSS, Machine Learning, Two-Step, Partition Around Medoids, Agglomerative, Hierarchical, Kohonen SOMs, Clustering, Obesity.


Data Mining and Optimization