Article Info

Privacy Based Classification Model of Public Data By Utilizing Two-Steps Validation Approach

Masnida Hussin, Raja Azlina Raja Mahmood, Nur Raidah Salim
dx.doi.org/10.17576/apjitm-2022-1102-10

Abstract

Digital information has become a trend and is integral to modernizing and leveraging various resources in Information Technology (IT). Vast data and information can be obtained anytime and anywhere at our fingertips through ICT facilities. This is considered public data due to its being shared publicly, such as on social media. Public data can be arranged according to various criteria and formats. Users have a right to understand which data can be publicly shared and which data is supposed to be in a private state. However, people always misunderstand and mislead which data needs to be secured and which can be shared. It is further critical when this public data is already exposed to data breaches and data theft. In this work, we propose a data privacy classification approach for public data where this data resides on digital platforms. It aims to inform the public about the level of data privacy before they reveal it on open and free digital platforms. We employed three different privacy classes: low, medium, and high. In response to that, we identified entities of public data that refer to digital information platforms such as websites, mobile apps, and online systems. We then dug further into the data attributes of each entity. The public data attributes are sorted and passed to respondents to obtain their input regarding their decisions on which privacy class is suitable for the respective attribute. Based on the input from respondents, we then used a Naive Bayesian classifier to generate probability weightage for re-assigning the data attributes into the most suitable privacy class. This two-level data classification stage brings better perspectives on data privacy. This modified version of the public data privacy class is then verified by the respondents to analyze their preferences while measuring users’ satisfaction. According to the results, our public data privacy classification model meets public expectations. Optimistically, well-organized data classification contributes to better data practices.

keyword

public data, data classification model, data privacy, Naive Bayesian classifier

Area

Data Mining and Optimization