Article Info

Data Annotation Architecture for Automatic Depression Detection

Yun Yao Chang, Nazlia Omar
dx.doi.org/10.17576/apjitm-2023-1201-03

Abstract

Depression is a mood disorder that causes a person to feel sad, tired and experience a prolonged lack of energy, irritability, and loss of interest in daily activities. Many scholars have contributed in identifying and curbing depression. One of such efforts is the development of a model that can identify and predict depression among Twitter users. However, so far, there is no quality and labeled dataset containing depression from tweet sources. Therefore, the purpose of this study is to propose an architecture that can collect data on social media such as Twitter to detect depression automatically. This study involves text analysis that begins with data scraping, text processing, feature extraction, modeling, evaluation and followed by document corpus analysis using TF-IDF and BOW. The sentiment lexicon derived from two tools, TextBlob and Vader, was used to distinguish the emotions of words. Four machine learning classifiers i.e. Logistic Regression, Decision Tree, Support Vector Machine and K-Nearest Neighbour were used to perform the classification. The final data set management and the use of Logistic Regression produced the expected high accuracy, precision, recall and F1-Score results in predicting depression. For the application, data for Malaysia local COVID-19 tweets was scraped using TWINT. Appropriate hashtags and keywords were used to obtain tweet sentences. The results show that the proposed architecture outperforms the baseline by achieving 92.876% F1-Score through SVM+TFIDF compared to the F-Score obtained by the baseline. This shows that the proposed data annotation architecture has good performance in detecting depression.

keyword

Depression, Machine Learning, Data Annotation, TWINT, COVID-19.

Area

Data Mining and Optimization